Module lib.chipseq
¶
Handling ChIP-seq peak-calling configuration correctly is complex. The functions in this module help manipulate the config information so we can use it more easily in the ChIP-seq workflow without cluttering the Snakefile.
|
Returns a dictionary of peak-calling runs from the config. |
|
Returns the block for the (label, algorithm) run. |
|
Returns the sample names configured for a particular peak-calling run |
|
Returns the merged input label for a merged IP label. |
Figure out if a BED file is narrowPeak or broadPeak. |
Details¶
Helpers for ChIP-seq.
- lib.chipseq.block_for_run(config, label, algorithm)[source]¶
Returns the block for the (label, algorithm) run.
- Parameters:
config (dict)
label (str)
algorithm (str)
- lib.chipseq.detect_peak_format(fn)[source]¶
Figure out if a BED file is narrowPeak or broadPeak.
Returns None if undetermined.
This is useful for figuring out which autoSql file we should use or which bigBed 6, 6+4, or 6+3 format to use.
- lib.chipseq.merged_input_for_ip(sampletable, merged_ip)[source]¶
Returns the merged input label for a merged IP label.
This is primarily used for the fingerprint rule, where we collect all the available input BAMs together.
- Parameters:
sampletable (pandas.DataFrame)
merged_ip (str) – Label of IP to use, must be present in the label column of the sampletable.
Examples
This should make more sense if we have an example to work with…..
Samples ip1 and ip2 are technical replicates. They are from a different experiment than ip3 and input3, hence their different biological_material.
The way we know that input1 should be paired with ip1 and ip2 is because it shares the same biological material.
Compare input1 and input9. They are not technical replicates (since they do not share the same label) but they are biological replicates because they share the same biological material.
>>> from io import StringIO >>> import pandas as pd >>> df = pd.read_csv(StringIO(''' ... samplename antibody biological_material label ... ip1 gaf s2cell-1 s2cell-gaf-1 ... ip2 gaf s2cell-1 s2cell-gaf-1 ... ip3 ctcf s2cell-2 s2cell-ctcf-1 ... input1 input s2cell-1 s2cell-input-1 ... input3 input s2cell-2 s2cell-input-3 ... input9 input s2cell-1 s2cell-input-1'''), ... sep='\s+')
>>> merged_input_for_ip(df, 's2cell-gaf-1') ['s2cell-input-1']
>>> merged_input_for_ip(df, 's2cell-ctcf-1') ['s2cell-input-3']
- lib.chipseq.peak_calling_dict(config, algorithm=None)[source]¶
Returns a dictionary of peak-calling runs from the config.
- Parameters:
config (dict)
algorithm (None) – If algorithm is None, dictionary is keyed by (label, algorithm). Otherwise, only the runs for algorithm are returned, keyed by label.