Guide to file hierarchy¶
The lcdb-wf
workflow system is designed to have a standardized directory
structure and file hierarchy to allow us to be as consistent across many diverse
and disparate analyses and sources of data and reduce the overhead when it comes
to troubleshooting when something goes wrong. All the components of the repository
are laid out with this overarching design principle in mind.
Below we give a high-level overview and brief description of the files and folders used by the workflows, and include an annotated directory tree highlighting the most important parts of the repository.
Folder organization¶
The top level of the repo looks like this:
[1] ├── ci/
[2] ├── docs/
[3] ├── include/
[4] ├── lib/
[5] ├── README.md
[6] ├── requirements-non-r.txt
[7] ├── requirements-r.txt
[8] ├── workflows/
[9] └── wrappers/
ci
contains infrastructure for continuous integration testing. You don’t have to worry about this stuff unless you’re actively developing lcdb-wf.docs/
contains the source for documentation. You’re reading it.include/
has miscellaneous files and scripts that can be used by all workflows. Of particular note is theWRAPPER_SLURM
script (see Running on a cluster for more) and thereference_configs
directory (see References workflow and Configuration for more).lib/
contains Python modules used by the workflows.README.md
contains top-level info.requirements-non-r.txt
contains the package dependencies needed to run the workflows, and is used to set up a conda environment.requirements-r.txt
contains the package dependencies for R and various Bioconductor packages used in downstream analysis. See conda and conda envs in lcdb-wf for the rationale for splitting these.workflows/
contains one directory for each workflow. Each workflow directory contains its ownSnakefile
and configuration files. We go into more detail in the next section.wrappers/
contains Snakemake wrappers, which are scripts that can use their own independent environment. See wrappers for more.
Below, you can see a detailed overview of the files contained in these folders.
Annotated tree¶
The following is an annotated directory tree of the lcdb-wf
repository to
help orient you. Hover over files for a tooltip description; click a file to
view the most recent version on GitHub.
Files in bold are the most important.
Now that you have seen which files and folders are the most important and have some idea of where everything lives, let’s look at how to run tests to make sure everything is set up correctly (see Testing the installation), or jump right in to learning about how to configure the workflows for your particular experiment (see Configuration).