RNA-Seq downstream analysis

In a typical RNA-seq analysis, it is relatively straightforward to go from raw reads to read counts in features to importing them into R. After that however, expression analysis gets a bit more complicated and highly depends on the design of the experiment.

We attempted to strike the balance between simplicity – where as much configuration as possible takes place via a config file – and flexibility where the R code can be modified as needed depending on the project.

This file is workflows/rnaseq/downstream/rnaseq.Rmd. It uses a separate conda environment that just has the R dependencies. It is rendered via knitr to create an HTML file. The inputs for the rule are the featureCounts output, the sample table, the lib/lcdbwf R package, and the Rmd.

Warning

This RMarkdown file is intended to be edited and customized per experiment.

How to use this code

  1. Activate the env-r conda environment (created as part of setting up the lcdb-wf deployment)

  2. Edit the workflows/rnaseq/downstream/config.yaml file. It is heavily commented and should be self-explanatory.

  3. Customize the contrasts you want to run (see below for details on this)

  4. From the workflows/rnaseq/downstream directory, run rmarkdown::render("rnaseq.Rmd") to get rnaseq.html

Here are some additional notes:

  • Many of the code chunks have the cache=TRUE option to speed up re-rendering and make iterative development quicker. When everything’s in a final state, you may want to delete the rnaseq_cache directory and re-run.

  • Many of the cached code chunks also specify a config argument. These config items are taken from the config.yaml file living alongside the rnaseq.Rmd. If a cached chunk specifies a config option, and the value in the config file changes, the chunk will be re-run because its cache is invalidated.

  • As with many analyses in R, the work is highly iterative. You may want to consider using an interactive interpreter, either via the command line or RStudio. To ensure that RStudio is using the same packages as the workflows, you should set the RSTUDIO_WHICH_R environment variable.

    The easiest way to do this is to activate the conda environment you’re using for the analysis, then export the identified location of R to that variable:

    source activate lcdb-wf
    export RSTUDIO_WHICH_R=$(which R)
    

    On MacOS, you may additionally need the following:

    launchctl setenv RSTUDIO_WHICH_R $RSTUDIO_WHICH_R
    

    Then run RStudio, which should pick up the conda environment’s version of R and which will already have packages like DESeq2 installed in the environment.

More details

For more detailed documentation, see Detailed documentation of RNA-Seq downstream.