Guide to file hierarchy

The lcdb-wf workflow system is designed to have a standardized directory structure and file hierarchy to allow us to be as consistent across many diverse and disparate analyses and sources of data and reduce the overhead when it comes to troubleshooting when something goes wrong. All the components of the repository are laid out with this overarching design principle in mind.

Below we give a high-level overview and brief description of the files and folders used by the workflows, and include an annotated directory tree highlighting the most important parts of the repository.

Folder organization

The top level of the repo looks like this:

[1]  ├── ci/
[2]  ├── docs/
[3]  ├── include/
[4]  ├── lib/
[5]  ├── README.md
[6]  ├── requirements-non-r.txt
[7]  ├── requirements-r.txt
[8]  ├── workflows/
[9]  └── wrappers/
  1. ci contains infrastructure for continuous integration testing. You don’t have to worry about this stuff unless you’re actively developing lcdb-wf.

  2. docs/ contains the source for documentation. You’re reading it.

  3. include/ has miscellaneous files and scripts that can be used by all workflows. Of particular note is the WRAPPER_SLURM script (see Running on a cluster for more) and the reference_configs directory (see References workflow and Configuration for more).

  4. lib/ contains Python modules used by the workflows.

  5. README.md contains top-level info.

  6. requirements-non-r.txt contains the package dependencies needed to run the workflows, and is used to set up a conda environment.

  7. requirements-r.txt contains the package dependencies for R and various Bioconductor packages used in downstream analysis. See conda and conda envs in lcdb-wf for the rationale for splitting these.

  8. workflows/ contains one directory for each workflow. Each workflow directory contains its own Snakefile and configuration files. We go into more detail in the next section.

  9. wrappers/ contains Snakemake wrappers, which are scripts that can use their own independent environment. See wrappers for more.

Below, you can see a detailed overview of the files contained in these folders.

Annotated tree

The following is an annotated directory tree of the lcdb-wf repository to help orient you. Hover over files for a tooltip description; click a file to view the most recent version on GitHub.

Files in bold are the most important.

README.md

workflows/

   references/

      Snakefile

   rnaseq/

      Snakefile

      config/

         sampletable.tsv

         rnaseq_patterns.yaml

         rnaseq.Rmd

         gene-patterns.Rmd

         functional-enrichment.Rmd

   chipseq/

      Snakefile

      config/

         sampletable.tsv

         chipseq_patterns.yaml

requirements.txt

ci/

   build-docs.sh

   dependency_consistency.py

   get-data.py

   key.enc

   travis-run.sh

   travis-setup.sh

config/

   sampletable.tsv

   config.yml

   4c-sampletable.tsv

   envs/

   multiqc_config.yaml

   test_4c_config.yaml

   test_config.yaml

   rnaseq-requirements.txt

include/

   adapters.fa

   WRAPPER_SLURM

lib/

   common.py

   postprocess/

      adapters.py

      dicty.py

      dm6.py

      erccFisher.py

      ercc.py

      hg19.py

      hg38.py

      __init__.py

      merge.py

      phix.py

      sacCer3.py

make_trackhub.py

wrappers/

   .gitignore

   LICENSE

   README.md

   test/

      conftest.py

      raw_data_fixtures.py

      test_atropos.py

      test_bowtie2.py

      test_cutadapt.py

      test_deeptools.py

      test_demo.py

      test_dupradar.py

      test_fastqc.py

      test_fastq_screen.py

      test_featurecounts.py

      test_hisat2.py

      test_kallisto.py

      test_multiqc.py

      test_picard.py

      test_rseqc.py

      test_salmon.py

      test_samtools.py

   test_toy.py

      utils.py

   wrappers/

Now that you have seen which files and folders are the most important and have some idea of where everything lives, let’s look at how to run tests to make sure everything is set up correctly (see Testing the installation), or jump right in to learning about how to configure the workflows for your particular experiment (see Configuration).