| Parameters: |
- de_results (str or pandas.DataFrame) – If str, it’s the filename of a TSV of differential expression results,
with first column as gene ID. It will be parsed into a dataframe where
the index is gene ID. When called as a library, an already-created
pandas.DataFrame can optionally be provided instead.
- regions (str or pybedtools.BedTool) – Gene regions in which to look for intersections with peaks. BED file
where the 4th column contains gene IDs that are also present in first
column of de_results. Typically this would be a BED file of promoters
or gene bodies. When called as a library, a pybedtools.BedTool object
can optionally be provided instead.
- peaks (str or pybedtools.BedTool) – BED file to be intersected with regions. When called as a library,
a pybedtools.BedTool object can optionally be provided instead.
- selected (str or list-like) – Replaces regions peaks arguments; useful for when you already know
which genes you want to select (e.g., upregulated from a different
experiment). If a string, assume it’s a filename and use the first
column which will be used as an index into the de_results dataframe.
When called as a library, if selected is not a string it will be used
as an index into the dataframe.
- x (str) – Column to use for x-axis. Default of “baseMean” expects DESeq2
results
- y (str) – Column to use for y-axis. Default of “log2FoldChange” expects DESeq2
results
- disable_logx (bool) – Disable default behavior of transforming x values using log10
- logy (bool) – Transform y values using log2
- pval-col (str) – Column to use for statistical significance. Default “padj” expectes
DESeq2 results.
- alpha (float) – Threshold for calling significance. Applied to pval_col
- lfc_cutoff (float) – Log2fold change cutoff to be applied to y values. Threshold is applied
post-transformation, if any specified (e.g., logy argument).
- plot_filename (str) – File to save plot. Format auto-detected by extension. Output directory
will be created if needed.
- disable_raster_points (bool) – Disable the default behavior of rasterizing points in a PDF. Use
sparingly, since drawing 30k+ individual points in a PDF may slow down
your machine.
- genes_to_label (str or list-like) – Optional file containing genes to label with text. First column must be
a subset of the first column of de_results. Lines starting with ‘#’
and subsequent tab-separated columns will be ignored. When called as
a library, a list-like object of gene IDs can be provided.
- label_column (str) – Optional column from which to take gene labels found in
genes_to_label (e.g., “symbol”). If the value in this column is
missing, fall back to the index. Use this if your gene IDs are long
Ensembl IDs but you want the gene symbols to show up on the plot.
- report (str) – Where to write out Fisher’s exact test results. Default is stdout
- gene_lists (str) – Prefix to gene lists. If specified, gene lists corresponding to the
cells of the 2x2 Fishers exact test will be written to
{prefix}.up.tsv and {prefix}.dn.tsv. These are subsets of de_results
where genes are up and have a peak in region (or are selected), or
downregulated and have a peak in region (or are selected),
respectively.
|