For Developers
==============

Creating and updating conda envs
--------------------------------

The ``env.yml`` and ``env-r.yml`` files contain fully-pinned versions of the
environments. This hopefully helps with stability and can dramatically speed up
the creation of environment. However these env definitions periodically need to
be updated.

To do so, create new environments using the unpinned versions in
``include/requirements.txt`` and ``include/requirements-r.txt``. This may take
substantially longer to create.

Then run the tests (:ref:`running-the-tests`) using those environments.

If all tests pass, then export the newly-created environments to the
``env.yml`` and ``env-r.yml`` files.

When you commit and push those files, the CI/CD system will detect that they
are different and will trigger a re-build of the cached environments and
proceed with the tests using those new environments.

Running the full complex datasets
---------------------------------

Prior to a release, the complex datasets should be run. These do a more
extensive job in testing the corner cases. This should be run on a cluster or
a machine with substantial resources. The configs can be found in
``include/test``. Here is how to run it using the WRAPPER_SLURM:

.. code-block:: bash

    sbatch ../../include/WRAPPER_SLURM \
      --configfile ../../test/test_configs/complex-dataset-rnaseq-config.yaml \
      --config sampletable=../../test/test_configs/complex-dataset-rnaseq-sampletable.tsv

Module documentation
--------------------

.. toctree::
   :maxdepth: 2

   lib.common
   lib.chipseq
   lib.patterns_targets


Adding a new aligner
--------------------

Modules
^^^^^^^

In `lib/common.py`, there is a function `references_dict`. Within that is
a `index_extensions` dictionary. You'll need to add the name of the aligner and
the extension of the index it creates. If it creates multiple index files, just
one should be sufficient. The filename will be automatically created and will
be used as the expected output file which can then be accessed from the
references dict as `references_dict[organism][tag][aligner]` for use in various
rules that need the index as input (that is, any mapping rules).

Configuration
^^^^^^^^^^^^^

- add the aligner to the `include/reference_configs/test.yaml` config file,
  "indexes:" section.

- write a rule in `workflows/references/Snakefile` to build the index. Use the
  other index-building rules there as a guide.

- Depending on which type of workflow the aligner is appropriate for, add
  a rule there. Enclose it in an "if:" clause to only run if the config file
  has specified that aligner.

- add the name to the list of supported aligners in `docs/config-yaml.rst`, in
  the "Aligner config" section.

- add appropriate memory/time requirements to the rule for that aligner.

Testing
^^^^^^^

- For testing, create a copy of the config for any workflows it is used for,
  and change only the aligner.

- Modify `.circleci/config.yml` to include a new block in each of the
  variables, jobs, and workflows sections. Use the `rnaseq-star` blocks as
  a guide for this. The idea is to only run up through the aligner step in
  a parallel task (to save on CI build time).


.. _new-peak-caller:

Adding a new peak-caller
------------------------

First, write a wrapper for the peak-caller. You can use the ``macs2``, ``spp``,
and ``sicer`` wrappers as a guide. A wrapper should expect one or more sorted
and indexed BAM files as IP, one or more sorted and indexed BAM files as input.
The wrapper should create at least a sorted BED file of peaks, and can
optionally create other supplemental files as well.

Next, add the peak-caller to the top of ``lib/patterns_targets.py`` in the
``PEAK_CALLERS`` list.

Then write a rule for the peak-caller, again using ``macs2``, ``spp``, or
``sicer`` rules as a guide.

Last, add additional lines in
``workflows/chipseq/config/chipseq-patterns.yaml`` for the
``patterns_by_peaks`` key.

To test or use, add the new peak-caller to the
``workflows/chipseq/config/config.yaml`` file's ``peak_calling`` key.