For Developers¶
Creating and updating conda envs¶
The env.yml
and env-r.yml
files contain fully-pinned versions of the
environments. This hopefully helps with stability and can dramatically speed up
the creation of environment. However these env definitions periodically need to
be updated.
To do so, create new environments using the unpinned versions in
include/requirements.txt
and include/requirements-r.txt
. This may take
substantially longer to create.
Then run the tests (Testing the installation) using those environments.
If all tests pass, then export the newly-created environments to the
env.yml
and env-r.yml
files.
When you commit and push those files, the CI/CD system will detect that they are different and will trigger a re-build of the cached environments and proceed with the tests using those new environments.
Running the full complex datasets¶
Prior to a release, the complex datasets should be run. These do a more
extensive job in testing the corner cases. This should be run on a cluster or
a machine with substantial resources. The configs can be found in
include/test
. Here is how to run it using the WRAPPER_SLURM:
sbatch ../../include/WRAPPER_SLURM \
--configfile ../../test/test_configs/complex-dataset-rnaseq-config.yaml \
--config sampletable=../../test/test_configs/complex-dataset-rnaseq-sampletable.tsv
Module documentation¶
Adding a new aligner¶
Modules¶
In lib/common.py, there is a function references_dict. Within that is a index_extensions dictionary. You’ll need to add the name of the aligner and the extension of the index it creates. If it creates multiple index files, just one should be sufficient. The filename will be automatically created and will be used as the expected output file which can then be accessed from the references dict as references_dict[organism][tag][aligner] for use in various rules that need the index as input (that is, any mapping rules).
Configuration¶
add the aligner to the include/reference_configs/test.yaml config file, “indexes:” section.
write a rule in workflows/references/Snakefile to build the index. Use the other index-building rules there as a guide.
Depending on which type of workflow the aligner is appropriate for, add a rule there. Enclose it in an “if:” clause to only run if the config file has specified that aligner.
add the name to the list of supported aligners in docs/config-yaml.rst, in the “Aligner config” section.
add appropriate memory/time requirements to the rule for that aligner.
Testing¶
For testing, create a copy of the config for any workflows it is used for, and change only the aligner.
Modify .circleci/config.yml to include a new block in each of the variables, jobs, and workflows sections. Use the rnaseq-star blocks as a guide for this. The idea is to only run up through the aligner step in a parallel task (to save on CI build time).
Adding a new peak-caller¶
First, write a wrapper for the peak-caller. You can use the macs2
, spp
,
and sicer
wrappers as a guide. A wrapper should expect one or more sorted
and indexed BAM files as IP, one or more sorted and indexed BAM files as input.
The wrapper should create at least a sorted BED file of peaks, and can
optionally create other supplemental files as well.
Next, add the peak-caller to the top of lib/patterns_targets.py
in the
PEAK_CALLERS
list.
Then write a rule for the peak-caller, again using macs2
, spp
, or
sicer
rules as a guide.
Last, add additional lines in
workflows/chipseq/config/chipseq-patterns.yaml
for the
patterns_by_peaks
key.
To test or use, add the new peak-caller to the
workflows/chipseq/config/config.yaml
file’s peak_calling
key.