This Page

API

Testing plotting

lcdblib.plotting.compare_rnaseq_and_chipseq.plot(de_results, regions=None, peaks=None, selected=None, x='baseMean', y='log2FoldChange', disable_logx=False, logy=False, pval_col='padj', alpha=0.1, lfc_cutoff=0, plot_filename=None, disable_raster_points=False, genes_to_label=None, label_column=None, report=None, gene_lists=None)[source]

M-A plot showing up- and downregulated genes with optional labeling and Fishers exact tests.

If –plot-filename is not specified, then the plot will be displayed and points can be clicked for interactive exploration.

If –peaks and –regions are specified, then results from Fishers exact tests will be printed to stdout, or to –report if specified.

Parameters:
  • de_results (str or pandas.DataFrame) – If str, it’s the filename of a TSV of differential expression results, with first column as gene ID. It will be parsed into a dataframe where the index is gene ID. When called as a library, an already-created pandas.DataFrame can optionally be provided instead.
  • regions (str or pybedtools.BedTool) – Gene regions in which to look for intersections with peaks. BED file where the 4th column contains gene IDs that are also present in first column of de_results. Typically this would be a BED file of promoters or gene bodies. When called as a library, a pybedtools.BedTool object can optionally be provided instead.
  • peaks (str or pybedtools.BedTool) – BED file to be intersected with regions. When called as a library, a pybedtools.BedTool object can optionally be provided instead.
  • selected (str or list-like) – Replaces regions peaks arguments; useful for when you already know which genes you want to select (e.g., upregulated from a different experiment). If a string, assume it’s a filename and use the first column which will be used as an index into the de_results dataframe. When called as a library, if selected is not a string it will be used as an index into the dataframe.
  • x (str) – Column to use for x-axis. Default of “baseMean” expects DESeq2 results
  • y (str) – Column to use for y-axis. Default of “log2FoldChange” expects DESeq2 results
  • disable_logx (bool) – Disable default behavior of transforming x values using log10
  • logy (bool) – Transform y values using log2
  • pval-col (str) – Column to use for statistical significance. Default “padj” expectes DESeq2 results.
  • alpha (float) – Threshold for calling significance. Applied to pval_col
  • lfc_cutoff (float) – Log2fold change cutoff to be applied to y values. Threshold is applied post-transformation, if any specified (e.g., logy argument).
  • plot_filename (str) – File to save plot. Format auto-detected by extension. Output directory will be created if needed.
  • disable_raster_points (bool) – Disable the default behavior of rasterizing points in a PDF. Use sparingly, since drawing 30k+ individual points in a PDF may slow down your machine.
  • genes_to_label (str or list-like) – Optional file containing genes to label with text. First column must be a subset of the first column of de_results. Lines starting with ‘#’ and subsequent tab-separated columns will be ignored. When called as a library, a list-like object of gene IDs can be provided.
  • label_column (str) – Optional column from which to take gene labels found in genes_to_label (e.g., “symbol”). If the value in this column is missing, fall back to the index. Use this if your gene IDs are long Ensembl IDs but you want the gene symbols to show up on the plot.
  • report (str) – Where to write out Fisher’s exact test results. Default is stdout
  • gene_lists (str) – Prefix to gene lists. If specified, gene lists corresponding to the cells of the 2x2 Fishers exact test will be written to {prefix}.up.tsv and {prefix}.dn.tsv. These are subsets of de_results where genes are up and have a peak in region (or are selected), or downregulated and have a peak in region (or are selected), respectively.