RNA-Seq downstream analysis¶
In a typical RNA-seq analysis, it is relatively straightforward to go from raw reads to read counts in features to importing them into R. After that however, expression analysis gets a bit more complicated and highly depends on the design of the experiment.
We attempted to strike the balance between simplicity – where as much configuration as possible takes place via a config file – and flexibility where the R code can be modified as needed depending on the project.
This file is workflows/rnaseq/downstream/rnaseq.Rmd
. It uses a separate
conda environment that just has the R dependencies. It is rendered via
knitr
to create an HTML file. The inputs for the rule are the featureCounts
output, the sample table, the lib/lcdbwf
R package, and the Rmd.
Warning
This RMarkdown file is intended to be edited and customized per experiment.
How to use this code¶
Activate the
env-r
conda environment (created as part of setting up the lcdb-wf deployment)Edit the
workflows/rnaseq/downstream/config.yaml
file. It is heavily commented and should be self-explanatory.Customize the contrasts you want to run (see below for details on this)
From the
workflows/rnaseq/downstream
directory, runrmarkdown::render("rnaseq.Rmd")
to getrnaseq.html
Here are some additional notes:
Many of the code chunks have the
cache=TRUE
option to speed up re-rendering and make iterative development quicker. When everything’s in a final state, you may want to delete thernaseq_cache
directory and re-run.Many of the cached code chunks also specify a config argument. These config items are taken from the
config.yaml
file living alongside thernaseq.Rmd
. If a cached chunk specifies a config option, and the value in the config file changes, the chunk will be re-run because its cache is invalidated.As with many analyses in R, the work is highly iterative. You may want to consider using an interactive interpreter, either via the command line or RStudio. To ensure that RStudio is using the same packages as the workflows, you should set the
RSTUDIO_WHICH_R
environment variable.The easiest way to do this is to activate the conda environment you’re using for the analysis, then export the identified location of R to that variable:
source activate lcdb-wf export RSTUDIO_WHICH_R=$(which R)
On MacOS, you may additionally need the following:
launchctl setenv RSTUDIO_WHICH_R $RSTUDIO_WHICH_R
Then run RStudio, which should pick up the conda environment’s version of R and which will already have packages like DESeq2 installed in the environment.
More details¶
For more detailed documentation, see Detailed documentation of RNA-Seq downstream.