Changelog¶
v1.10.3¶
improve the deploy script (thanks @aliciaaevans)
support the epic2 peak-caller for the ChIP-seq workflow (thanks @Mira0507)
for later versions of featureCounts, add
--countReadPairs
argument to RNA-seq workflow (@therealgenna)
v1.10.2¶
Minor bugfix release.
Fix multiqc configs so that they coorectly ignore any cutadapt fastqc zips when building the raw fastq section
Fix multiqc config for chipseq so it correctly cleans the
_R2
extension to better support PE ChIP-seq-like workflowsFix functional enrichment label truncation to ensure that truncated labels are unique
v1.10.1¶
This is a bugfix and minor patch release.
Bugfix: the references workflow was missing the
resources:
directives; they have now been added.Bugfix: kallisto strandedness was set incorrectly for libraries using ligation prep (fr-secondstrand)
The new
utils.autobump
function can be used to easily specify default and incremented resources, and theutils.gb
andutils.hours
make it a little easier to specify when autobump is not required.In the following example, memory will be set to 8 * 1024 MB and will increment by that much each retry. The runtime will be set to 2 * 60 minutes, and will increment by 10 * 60 minutes each retry. The disk will be set to 100 * 1024 MB, and will not increase each retry.
resources: mem_mb=autobump(gb=8), runtime=autobump(hours=2, increment_hours=10), disk_mb=gb(100)
WRAPPER_SLURM no longer has the
--latency-wait=300
,--max-jobs-per-second=1
, and--max-status-checks-per-second=0.01
which would override any profile settings.In RNA-seq and ChIP-seq, the cutadpt rule now defaults to using
--nextseq-trim 20
instead of-q 20
, to better handle the majority of sequencing data we have recently been working with (NovaSeq). See this section of the cutadapt docs for details.Updated requirements to use a recent version of salmon to avoid segfaults
rnaseq.Rmd, when saving the Rds file at the end, now disables compression. This can have a dramatic improvement on downstream performance for a reasonable disk space cost.
functional-enrichment.Rmd, now supports KEGG pathways & parallel operation.
functional-enrichment.Rmd, gene-patterns.Rmd, now saves Rds file at the end (without compression) adding the respective object lists.
added
--overlap 6
to cutadapt to avoid greedy trimming
v1.10¶
The major change here is refactoring the Snakefiles to use the resources:
directive in each rule, and removing the --clusterconfig
mechanism which
has long been deprecated.
For running on a cluster, this requires a profile. E.g., on NIH’s Biowulf, use the NIH-HPC snakemake_profile.
General¶
No longer using clusterconfig, instead using resources to configure cluster resources
Migrated to a unified testing script that simplifies local and CI testing
If sampletable is from SRA, raise an error if a Layout column can’t be found (to prevent incorrect interpretation of samples as single-end)
Ensure bam indexes are made for the markdups bams, even if bigwigs are not created
Remove libsizes table, which was largely redundant with fastqc results
RNA-seq¶
Fix R tests
All
lcdbwf
R functions use the:::
namespace lookup syntaxFix library loads in rnaseq.Rmd to ensure they come before parallelization configuration
New function
lcdbwf:::lfc_scatter
for comparing multiple DESeq2 contrastsUpdates and fixes to
gene-patterns.Rmd
v1.9¶
This version has substantial changes in the rnaseq.Rmd
file to streamline
its use in a production environment. This involves moving most of the code
complexity into the lcdbwf
R package and using a new config file as much as
possible. See details below.
General¶
environments have been updated with recent versions of all tools
WRAPPER_SLURM arguments updated with arguments better suited for cluster submission
PhiX reference configs have been removed
compatibility with Python 3.10
fastq-dump rules have been converted to scripts. This is because sra-tools in versions earlier than 3.0 have issue with SSL certs, however sra-tools=3 cannot be installed alongside recent versions of salmon (due to conflicting pinnings with the
icu
package). Therefore, fastq-dump is now run as a script in its own conda environment.new idxstats rule for chipseq and rnaseq
RNA-seq¶
This version has major changes to rnaseq.Rmd
. Briefly:
This file has been overhauled to be driven by a config file. This dramatically reduces the need to scroll through the RMarkdown file and make all the customizations for a particular experiment. Now, editing the config file sets up most of the project-specific components. Note that contrasts still need to be customized in the Rmd file.
The narrative and explanatory text has been moved to
text.yaml
and is included at render time. This reduces the need to scroll through lots of boilerplate text in the RMarkdown while still retaining the ability to easily edit it.Most of the complexity has been offloaded to the
lcdbwf
R package.Caches are much improved. See the Detailed documentation of RNA-Seq downstream section for more information.
Functional enrichment is moved into a separate RMarkdown file.
Downstream RNA-seq config¶
The file, workflows/rnaseq/downstream/config.yaml is heavily commented to describe the various settings. The sections of the config are designed such that they can be used as additional chunk options to chunks in which they are used. This additional chunk option is used by RMarkdown to compute the hash of the chunk. The result is that making a change in the config file is sufficient to invalidate the cache of any chunks that specify that section as a chunk option.
Complexity moved to lib/lcdbwf/R
¶
Another major change is that most of the complexity in the rnaseq.Rmd
file
has been factored out into the lcdbwf
R package that is stored inn
lib/lcdbwf
. While this means that all code is no longer included in the
final rendered HTML file, it does make the Rmd much more streamlined to work
with. It also has the side effect of making it easier to write unit tests on
separate functions.
Many helper functions have been added to the lcdbwf
R package, including
ones to streamline the creation of dds and results objects, composing and saving
them, and generating many of the outputs.
Improved caching of results chunks¶
A somewhat major change is a new strategy for allowing results()
calls to be
split across multiple, independently-cached chunks that are then properly merged
together into a single res.list
object while handling dependencies and
parallelization (thanks to @njohnso6). This
dramatically speeds up the process of incrementally adding contrasts to complex
experimental designs.
Other changes¶
In addition to these major changes, there are also many other improvements
to rnaseq.Rmd
:
AnnotationHub databases are only retrieved from cache when they are needed. This dramatically speeds up rendering of the HTML, since before the OrgDb would always load no matter what.
Toggle Kallisto or Salmon quantification with a simple true/false; this automatically sums to gene level using automatically retrieved TxDb. This also now supports creating dds objects from featureCounts, Salmon, or Kallisto in such a way that they can be easily compared with each other.
lcdbwf::compose_results()
to combine res_list and dds_list objects together by inspecting the global namespace for specially-named objectsHelper functions for retrieving global config and data structures (e.g.,
lcdbwf::get_config()
,lcdbwf::get_dds()
)Helper function
lcdbwf::match_from_dots
for working with … arguments and splitting them up to only go to the functions they are intended forMuch faster to attach info (e.g., adding SYMBOL to all results) since the AnnotationDbi calls are only done once instead of for each results object.
Refactored functional enrichment to be much more generalized, currently using Gene Ontology and MSigDB. MSigDb, via the
msigdbr
package, is available for multiple species and so this incorporates Reactome and KEGG. But the generalized method can be applied to any arbitrary gene sets, allowing for much more customization.Fixes to clusterProfiler::emapplot calls in particular corner cases
Functional enrichment is now a completely separate file, using the
combined.Rds
file as an intermediate betweenrnaseq.Rmd
andfunctional_enrichment.Rmd
.All-in-one enrichment function that runs either overrepresentation or GSEA. Makes it much easier to do ad hoc tests.
Helper function
lcdbwf::enrich_list_lapply()
to apply arbitrary functions to the highly-nested enrich_list data structureHelper function
lcdbwf::collect_objects
to help compile discovered results objects
lcdbwf::get_sig()
has more options for what to returnPlotting wrappers for clusterProfiler plot functions, allowing plots to be configured via the config file.
New dds diagnostics and results diagnostics functions and sections of the Rmd, useful for troubleshooting
Refactored the results tabs: MA plots come first; ensure 10 genes are always plotted in MA plots, added volcano plots with labeled genes, removed top 3 and bottom 3 gene plots
PCA plots using plotly no longer need “unrolled” for-loops; multiple PCA coloring and clustered heatmap row side colors are now configured in the YAML config file
Moved size factor plots and gene version removal to lcdbwf package
Use datatable to show initial sampletable for cleaner output
Make original dds_initial object the same way as later dds objects and always using a design of
~1
to be used in PCA and heatmaps“Differential expression” header moved so that code is no longer hidden under the size factors plot
Option for filling in NA in symbol with Ensembl IDs
collapseReplicates2 uses
collapse_by
rather thancombine.by
Updated the code style throughout to use the tidyverse/google style guide
RNA-seq differential expression output is additionally included in an Excel file with one sheet per contrast.
Tests¶
lcdbwf
R package now has its own tests viadevtools
andtestthat
recent versions of Snakemake are broken when
--until
is used in certain circumstances; a ChIP-seq test has been disabled temporarily.after a successful test, the environment is written as an artifact on circleci
References¶
Fixed a longstanding issue with S. cerevisiae, now the GFF file is properly converted to GTF.
v1.8¶
General¶
Complete shift to using pinned
env.yaml
files to specify conda environments, and usingmamba
for building environments (consistent with recent versions of Snakemake). This is now reflected in documentation and the updated-and-improveddeploy.py
.Reorganization/cleanup of the
include
directoryAdded conda troubleshooting notes to the documentation (see Troubleshooting environments).
The
lib.helpers.preflight
function no requires the first column of the sampletable to be named samplename when checking configs.Improvements to the deployment script
deploy.py
:now requires Python >3.6
proper logs (so you can easily see how long it takes to build an env)
supports downloading and running the script directly, which will clone a temporary copy and deploy from there
using Control-C to stop the deployment will also stop mamba/conda
colored output
mamba is used by default, but
--conda-frontend
will use conda instead
fastq-dump log is sent to file rather than printed to stdout
Threads: cutadapt single-end now uses specified threads (it was using 1 thread by default); use 6 threads for fastqc
Added new preflight checks for RNA-seq and ChIP-seq specific configs.
Added a
run_complex_test.sh
driver script for testing the workflows on full-scale publicly available data
RNA-seq¶
Configuration change: The
stranded:
field is now required for RNA-seq. This is used to choose the correct parameters for various rules, and avoids one of the main reasons to edit the Snakefile. See stranded field for more details on its use.added
stranded:
field to all configs used in testingThe
strand_check
rule now runs MultiQC for a convenient way of evaluating strandedness of a library.Kallisto is now supported in both the RNA-seq Snakefile, references Snakefile, included reference configs, and downstream
rnaseq.Rmd
References¶
When checking URLs in reference configs, don’t use
curl
to checkfile://
URIs.There is a new feature for reference configs that allows chaining post-processing functions together, see More advanced postprocessing. This means that it is possible, for example, to add ERCC spike-ins (which need post-processing) onto references that themselves need post-processing.
lib/postprocess/ercc.py
has new helper functions for adding ERCC spike-ins to fasta files and GTF files.added
'kallisto'
to included reference configs
ChIP-seq¶
symlinks rule is now local
added collectinsertsizes pattern to support PE ChIP-seq experiments
merging bigwigs log no longer goes to stdout
v1.7¶
Setup¶
Use mamba for installation of environments, consistent with Snakemake recommendations
Testing¶
We now recommend using mamba to create conda environments. This is dramatically faster and solves some dependency issues. Our automated tests now use this.
We have moved from requirements.txt files to env.yaml files. We also now encourage the use of the strictly-pinned environments for a more stable experience to hopefully avoid transient issues in the packaging ecosystem.
tbb=2020.2
as a dependency to fix a recent packaging issue with conda-forge.many documentation improvements
symlinks rule is only set to localrule when it exists (it does not exist when running an analysis exclusively from SRA)
References¶
updated URLs for those that have changes (e.g., Sanger -> EBI; using https instead of ftp for UCSC-hosted genomes)
new
gff2gtf
post-process tool for when an annotation is only available as GFF. S. pombe needs this, for example, and the Schizosaccharomyces_pombe.yaml` reference config has been updated accordingly.The references workflow no longer reads the config file in its directory. This fixes some subtle overwriting issues when providing config files or items from the command line during as is used during certain test runs. If running the references workflow alone, it must be called with
--configfile
RNA-seq¶
featureCounts now uses BAM files with duplicates marked. Previously if you wanted to run featureCounts in a mode where it excluded duplicates you would need to reconfigure rules.
improved comments in RNA-seq downstream RMarkdown files
Testing¶
new test that checks all URLs identified in config files to ensure that the included reference files remain valid
there is now a separate
run_downstream_test
script`simplified the CircleCI DAG to optimize testing resources
v1.6¶
References¶
overhaul the way transcriptome fastas are created. Instead of requiring separate download, they are now created out of the provided GTF and fasta files. The reference config section now uses keys
genome:
,transcriptome:
, andannotation:
rather than thefasta:
andgtf:
keys.backwards-incompatible change: reference config files have been updated to reflect the changes in the references workflow
Update PhiX genome fasta to use NCBI rather than Illumina iGenomes
ChIP-seq workflow¶
ChIP-seq workflow now properly supports paired-end reads
A ChIP-seq workflow can now be run when the
chipseq:
and/orpeak_calling:
sections are omitted.added a missing bowtie2 config entry in
clusterconfig.yaml
which could result in out-of-memory errors when submitting to a cluster using that file
RNA-seq workflow¶
if colData is a tibble this no longer causes issues for importing counts
dupRadar removed from RNA-seq workflow. We ended up never using it, and it depends on R which we’ve since removed from the main environment.
new
strand_test
rule, which can be run explicitly withsnakemake -j2 strand_check
. This generatesstrandedness.tsv
in the current directory, which is the summarize output of RSeQC’sinfer_experiment.py
across all samples.implement STAR two-pass alignment. Default is still single-pass.
Clean up hard-coded STAR indexing Log.out file
Include
ashr
andihw
Bioconductor packages in the R requirements, for use with recent versions of DESeq2.
RNA-seq downstream¶
Functional enrichment and gene patterns are now separate child documents. This makes it easier to turn them on/off by only needing to adjust the chunk options of the child chunk
Created a new documentation method for rnaseq.Rmd. Now there is a separate, dedicated documentation page with sections that exactly correspond to each named chunk in the Rmd, as well as a tool for ensuring that chunks and docs stay synchronized. See global_options for the new docs.
New
counts.df
andcounts.plot
functions to make it much easier to make custom dotplots of top counts by melting and joining the counts table with the metadata in colData.DEGpatterns cluster IDs are now added as additional columns in the output TSVs for each contrast
Many functions in the rnaseq.Rmd now expect a list of dds objects. See dds_list for more info on this.
Created a new R package,
lcdbwf
stored inlib/lcdbwf
. This can be edited in place, and it is loaded from disk withinrnaseq.Rmd
.Modified some output keys to support recent versions of Snakemake, for which
count
is a reserved keyword
General¶
Conda environments are now split into R and non-R. See conda and conda envs in lcdb-wf for details. Updated
deploy.py
accordinglysymlinks rules are now set to be localrules
updated workflows to work on recent Snakemake versions
split environments into non-R and R. This, along with a loose pinning of versions (
>=
), dramatically speeds up environment creation.updates to support latest Snakemake versions
- improvements to testing:
environment YAML files, rendered HTML, and docs are stored as artifacts on CircleCI
consolidations of some RNA-seq tests to reduce total time
additional comments in the test config yaml to help new users understand the system
new “preflight check” function is run to hopefully catch errors before running workflows
updates to support recent Picard versions
added wildcard constraints to help Snakemake solve DAG
v1.5.3¶
General¶
Bugs¶
add bed12 conversion for all species with default reference configs
presence of an orig_filename_R2 in sampletable is sufficient to consider the experiment PE
ensure DEGpattern output only contains unique genes
bring back featurecounts in multiqc report
“attach” chunk in rnaseq.Rmd was not properly set to depend on the “results” chunk
RNA-seq¶
dds objects can now be created from a full featureCounts input file and a subsetted colData table, if subset.counts=TRUE
improve the dependencies between rnaseq.Rmd chunks so that cache=TRUE behaves as expected: (#232)
add plots for rnaseq.Rmd size factors (#222)
run rseqc instead of CollectRnaSeqMetrics (the multiqc output is nicer for it, and it’s pretty much doing the same thing) (#218)
when converting Ensembl to symbol, if there is no symbol then fall back to the Ensembl ID to avoid NA (#246)
in rnaseq.Rmd, all caches will be invalidated if the sampletable or the featurecounts table have changed.
Tests¶
using continuumio/miniconda3 container; finally got en_US.utf8 locale installed and working correctly in that container so that multiqc works.
v1.5.2¶
Bug fixes¶
When some samples were substrings of other samples (e.g., WT_1_1 and WT_1_10), DESeqDataSetFromCombinedFeatureCounts was assigning the wrong names. This has now been fixed in helpers.Rmd.
v1.5.1¶
Bug fixes¶
DESeqDataSetFromCombinedFeatureCounts (added in v1.5) was incorrectly assigning labels to samples when the order of the sampletable did not match the order of the samples in the featureCounts table columns. This has been fixed.
General¶
deploy.py deployment script now only pays attention to files checked in to version control and optionally can create a conda environment in the target directory.
tests now work out of a newly-deployed instance to better reflect real-world usage
ChIP-seq and RNA-seq¶
reorder cutadapt commands to avoid a MultQC parsing bug in the cutadapt module (see https://github.com/ewels/MultiQC/issues/949)
RNA-seq¶
The majority of these changes affect rnaseq.Rmd
:
modifications to MultiQC config to get back featureCounts output
plotMA.label function (in
helpers.Rmd
) now defaults to FDR < 0.1 (instead of 0.01), and additionally supports labeling using different columns of the results object (e.g., “symbol”).remove some now-redundant featureCounts code
add a comment showing where to collapse replicates
convert colData’s first column to rownames
implement lower limit for DEGpatterns clustering (default is 0, but can easily set to higher if you’re getting issues)
expose arbitrary additional function arguments to
top.plots
. This allows different intgroup arguments to be passed to the my.counts function, enabling different ways of plotting the gene dotplots.
v1.5 (Sept 2019)¶
Major change: it is no longer possible to mix single-end and paired-end samples within the same run of the workflow. See #208 and the corresponding issue description at #175.
This version also has many improvements to the rnaseq.Rmd
file for RNA-seq,
as described below.
RNA-seq¶
Many changes and improvements to rnaseq.Rmd
, including:
Differential analysis summaries now include labeled MA plots (#192)
PCA plots now use plotly for improved insepction of samples (#192
don’t use knitrBootstrap any more (#192
heatmaps use heatmaply package for better interaction (#192
allow
sel.list
to be used for UpSet plots and fix some typos #205workaround for degPatterns for corner cases where there are few clusters because of the
minc
parameter (#205)alpha and lfc.thresh are now pulled out into a separate chunk (#206)
Support AnnotationHub http proxy handling in new version of AnnotationHub (#207).
As well as the following changes to other parts of the RNA-seq workflow, such as:
References¶
fix typo in S. pombe name
All workflows¶
Documentation now recommends creating an environment for each directory using the -p argument (#195)
v1.4.2 (Jul 2019)¶
Bugfixes¶
Don’t require ChIP-seq configs to have at least one block for each supported peak-caller
v1.4.1 (Jul 2019)¶
RNA-seq¶
KEGG results were not being added to the
all.enrich
list inrnaseq.Rmd
symlinking bigWigs is now a local rule
default cutadapt options have changed to reflect current recommendations from the author, and the cutadapt rule is now explicity using arguments rather than requiring a separate
adapters.fa
file.featureCounts now auto-detects whether it should be run with the
-p
argument in paired-end mode (previously it was up to the user to make sure this was added). The rule does have an override if this behavior is not wanted.
References¶
The reference config for Drosophila is now fixed. Previously it depended on chrom_convert. That script was a fly-specific script in lcdblib, but lcdblib is no longer a dependency since v1.3. This fix uses the convert_fastq_chroms and convert_gtf_chroms used in reference configs for other species.
v1.4 (May 2019)¶
RNA-seq¶
Much-improved rnaseq.Rmd
:
tabbed PCA plot
improved DEGpatterns chunk
dramatically improved functional enrichment section, with tabbed clusterprofiler plots and exported data in two flavors (combined and split)
improved upset plots, with exported files showing sets of genes
improved comments to highlight where to make changes
- add new helper functions to
helpers.R
: fromList.with.names
, for getting UpSet plot outputrownames.first.col
, to make tidier dataframesnested.lapply
, for convenient 2-level nested list applyclusterprofiler helper functions
- add new helper functions to
v1.3 (May 2019)¶
Bugfixes¶
Fix broken paired-end support for RNA-seq. Previously, when using data from elsewhere on disk and using the symlink rules, R2 would be symlinked to the same file as R1.
Support for Snakemake 5.4.0 which changes behavior of the
expand()
function.
Infrastructure¶
new deploy script to copy over only the files necessary for an analysis, avoiding the clutter of testing infrastructure.
lcdblib, an external package, is no longer a dependency. In the interest of better transparency and to make the code here easier to follow, the relevant code from lcdblib was copied over to the
lib
directory in this repository.
ChIP-seq and RNA-seq¶
Bowtie2, HISAT2, and rRNA rules no longer use wrappers. This makes it easier to track down what parameters are being used in each rule.
RSeQC is now available in Python 3 so wrappers have been removed.
NextGenMap support removed
v1.2 (Mar 2019)¶
RNA-seq¶
First-class paired-end support, including mixing PE and SE samples in the same sampletable
Support for STAR aligner
References¶
FASTA files are always symlinked into the directories of indexes that were created from it
Reference configs:
updated existing
added more species
new post-process for fasta or gtf: you can now use NICHD-BSPC/chrom-name-mappings to convert chromosome names between UCSC and Ensembl (see reference configs for examples of use)
ChIP-seq and RNA-seq¶
Updates to dependencies and MultiQC config
Infrastructure¶
Updated requirements in
requirements.txt
and in wrappersChanged all
pd.read_table()
topd.read_csv(sep="\t")
to prevent warningsChanged all
yaml.load()
toyaml.load(Loader=yaml.FullLoader)
to prevent warningsUsing DeprecationWarning rather than UserWarning in the deprecation handler so there’s less spam in the logs
Improved tests:
using data from pybedtools repo because modENCODE seems to be down
append rather than prepend base conda to PATH on circleci
separate isolated tests for STAR, ngm, and SRA
updated conda
Docs additions:
TMPDIR handling
clusterconfig
WRAPPER_SLURM
docs for developers
symlinking fastqs
using SRA sampletables
paired-end data
Colocalization¶
From colocalization, removed the GAT “fractions” heatmap due to unresolved pandas index errors
v1.1 (Aug 2018)¶
Infrastructure¶
The default settings in Snakefiles are for real-world use, rather than for testing. This reduces the amount of editing necessary before running actual data. See A note about test settings for the extra step to take when testing locally.
new
run_test.sh
script in each workflow directory to automatically run the preprocessor when running test dataadded extensive comments to Snakefiles with
NOTE:
string to make it obvious where and how to make changes.Documentation overhaul to bring everything up to v1.1. This includes Sphinx autodocs on the
lib
module.pytest test suite is run on the
lib
module
References¶
new metadata section in references config, which can be used to store additional information like mappable bases and genome size.
References can now be included from other YAML files into the main config file. This dramatically simplifies individual configfiles, and allows multiple workflows to use identical references without having to do error-prone and hard-to-maintain copy/pastes between workflow configs. See References config for details.
New GTF conversion,
mappings
. This is intended to replace theannotation_hub
conversion, which was problematic because 1) a particular annotation hub accession is not guaranteed to be found in new versions of AnnotationHub, resulting in lack of reproducibility, and 2) it was difficult to synchronize the results with a particular GTF annotation. Theannotation_hub
conversion is still supported, but if it’s used then a DeprecationWarning will be emitted, recommendingmappings
instead.
Both RNA-seq and ChIP-seq¶
fastq_screen is now configured via
config.yaml
. This reduces the need to edit the Snakefile and coordinate between the config and the fastq_screen rule. Now everything is done within the config file.fastq_screen wrapper now handles additional output files created when using the
--tag
and--filter
arguments tofastq_screen
.In the config file,
assembly
has been changed to the more-descriptiveorganism
. The change is backwards compatible, but a DeprecationWarning is raised ifassembly:
is still used, and changed toorganism
(though only in memory, not on disk).Patterns no longer use
{sample_dir}
,{agg_dir}
, etc placeholders that need to be configured in the config YAML. Instead, these directories are hard-coded directly into the patterns. This simplifies the config files, simplifies the patterns, and removes one layer of disconnect between the filenames and how they are determined.removed 4C workflow since it used 4c-ker
ChIP-seq¶
macs2 and sicer can accept mappable genome size overrides
RNA-seq¶
RNA-seq downstream:
downstream/help_docs.Rmd
can be included for first-time users to describe the sections of the RNA-seq analysisrnaseq.Rmd
now uses the sameNOTE:
syntax as the Snakefiles for indicating where/what to changeEasy swapping of which strand to use from the three featureCounts runs performed by the workflow
Be explicit about using DESeq2::lfcShrink as is now the default in recent DESeq2 versions
improved the mechanism for keeping together results objects, dds objects, and labels (list of lists, rather than individual list object; refactored functions to use this new structure
v1.0.1 (Jun 2018)¶
Bugfixes, last release before references changes.
Infrastructure¶
Transition to CircleCI for testing
Use production settings by default; see A note about test settings for more.
lots o’ docs
new
include/references_configs
to help organize references. These are currently not used by the workflows directly.bugfix: use additional options when uncompressing downloaded reference files (
--no-same-owner
fortar
,-f
forgunzip
)additional dependencies in the top-level environment to support the additional features in rnaseq.Rmd and track hubs.
colocalization workflow, external workflow, figures workflow to demonstrate vertical integration
RNA-seq¶
remove kallisto indexing, use salmon
improvements to how chipseq sampletables are parsed (with more informative error messages)
run preseq for RNA-seq library complexity QC
support for merging bigwigs
featureCounts is now run in all three strandedness modes, and results incorporated into MultiQC as separate modules.
RNA-seq now symlinks “pos” and “neg” bigWigs, which describe how reads map to the reference, to “sense” and “antisense” bigWigs, which describe the originating RNA. This makes it easy to swap strands depending on protocol.
new
downstream/helpers.Rmd
which factors out a lot of the work previously done inrnaseq.Rmd
into separate functions.track hub building respects new sense/antisense bigwig symlinks
downstream/rnaseq.Rmd
¶
AnnotationHub uses cache dir that will not clobber default home directory cache
use varianceStabilizingTransform instead of rlog
print a size factors table
use multiple cores for computationally expensive DESeq2 operations
using separate lists for results, dds objects, and nice labels for automated plots for each contrast
UpSet plots for comparing gene lists across contrasts
DEGpattern plots for showing clusters of expression patterns (from the DEGreport package)
attach normalized counts per sample and per factor (parsed from the model used for the contrast) as well as TPM estimates to the results tables
trim the labels in GO enrichment plots when too long
ChIP-seq¶
sicer for chipseq domain calling
pin snakemake <4.5.0 so that subworkflows behave correctly
chipseq peak-calling rules (and therefore wrappers) now expect a chromsizes file as input
bigbed files for narrowPeak and broadPeak files are created correctly depending on their format
run multiBigWigSummary and plotCorrelation from deepTools for ChIP-seq QC
ChIP-seq track hub generation script
Both RNA-seq and ChIP-seq¶
update deeptools calls to reflect >v3.0 syntax
support for SRA run tables so it’s trivial to re-run experiments in SRA
multiple FastQC runs are shown separately in MultiQC output
v1.0 (May 2018)¶
First official full release.