.. _downstream: RNA-Seq downstream analysis =========================== In a typical RNA-seq analysis, it is relatively straightforward to go from raw reads to read counts in features to importing them into R. After that however, expression analysis gets a bit more complicated and highly depends on the design of the experiment. We attempted to strike the balance between simplicity -- where as much configuration as possible takes place via a config file -- and flexibility where the R code can be modified as needed depending on the project. This file is ``workflows/rnaseq/downstream/rnaseq.Rmd``. It uses a separate conda environment that just has the R dependencies. It is rendered via ``knitr`` to create an HTML file. The inputs for the rule are the featureCounts output, the sample table, the ``lib/lcdbwf`` R package, and the Rmd. .. warning:: This RMarkdown file is **intended to be edited and customized per experiment**. How to use this code ~~~~~~~~~~~~~~~~~~~~ 1. Activate the ``env-r`` conda environment (created as part of setting up the `lcdb-wf` deployment) 2. Edit the :file:`workflows/rnaseq/downstream/config.yaml` file. It is heavily commented and should be self-explanatory. 3. Customize the contrasts you want to run (see below for details on this) 4. From the :file:`workflows/rnaseq/downstream` directory, run ``rmarkdown::render("rnaseq.Rmd")`` to get :file:`rnaseq.html` Here are some additional notes: - Many of the code chunks have the ``cache=TRUE`` option to speed up re-rendering and make iterative development quicker. When everything's in a final state, you may want to delete the ``rnaseq_cache`` directory and re-run. - Many of the cached code chunks also specify a config argument. These config items are taken from the :file:`config.yaml` file living alongside the :file:`rnaseq.Rmd`. If a cached chunk specifies a config option, and the value in the config file changes, the chunk will be re-run because its cache is invalidated. - As with many analyses in R, the work is highly iterative. You may want to consider using an interactive interpreter, either via the command line or RStudio. To ensure that RStudio is using the same packages as the workflows, you should set the ``RSTUDIO_WHICH_R`` environment variable. The easiest way to do this is to activate the conda environment you're using for the analysis, then export the identified location of R to that variable: .. code-block:: bash source activate lcdb-wf export RSTUDIO_WHICH_R=$(which R) On MacOS, you may additionally need the following: .. code-block:: bash launchctl setenv RSTUDIO_WHICH_R $RSTUDIO_WHICH_R Then run RStudio, which should pick up the conda environment's version of R and which will already have packages like DESeq2 installed in the environment. More details ~~~~~~~~~~~~ For more detailed documentation, see :ref:`downstream-detailed`. .. toctree:: :maxdepth: 2 rnaseq-rmd