de - Differential Expression

Important

The model formulas in this module use the Patsy-lite mini-language. Be sure to read that first before writing your models!

Also remember to filter prior to differential expression analysis. The number of genes provided for hypothesis testing may affect the results. You may need to filter out genes that have zero expression in all of the samples you are interested in.

Differential expression tools. Each of these mthods accepts a design formula, a counts matrix file, and a column data file. The design formula is specified using the Patsy-lite mini-language. The counts and column data matrices must be formatted as with any other tool in detk.

deseq2

Command line interface to a canonical DESeq2 analysis. To run a DESeq2 analysis on a counts matrix and accompanying column data file:

detk-de deseq2 "counts ~ AgeOfDeath + Status" raw_counts.csv column_data.csv > deseq2_results.csv

This is roughly equivalent to the following R:

library(DESeq2)

counts <- read.csv("raw_counts.csv",rownames=1)

design.mat <- read.csv("column_data.csv")

dds <- DESeqDataSetFromMatrix(
    countData = counts,
    colData = design.mat,
    design = ~ AgeOfDeath + Status
)

dds <- DESeq(dds, minReplicatesForReplace=Inf)

write.csv(results(dds,cooksCutoff=Inf),de.out.fn)

The analysis implemented here differs from the default DESeq2 analysis in the following ways:

  • the design formula specified on the command line must have the value counts as the only term of the left hand side
  • no outlier mean trimming based on Cooks distance is performed
  • no p-values or adjusted p-values are flagged or omitted due to outliers
  • estimated parameters, statistics, and p-values are reported for all variables in the model in the output, rather than just the last term (request the default behavior using the --last-term-only command line flag)
  • no independent filtering is performed
  • all columns related to a term in the model have the term name prepended in the output, e.g. Status__log2FoldChange

Usage:

Usage:
    detk-de deseq2 [options] <design> <count_fn> <cov_fn>

Options:
    -o FILE --output=FILE  Destination of primary output [default: stdout]
    --rda=RDA              Filename passed to saveRDS() R function of the result
                           objects from the analysis
    --strict               Require that the sample order indicated by the column names in the
                           counts file are the same as, and in the same order as, the
                           sample order in the row names of the covariates file
    --norm-counts          Prevent DESeq2 from normalizing counts prior to
                           running differential expression, default behavior
                           assumes that provided counts are raw
    --last-term-only       Use the default DESeq2 behavior of returning DE parameters
                           for the last term in the model, default behavior is to
                           report parameters for all variables in the model
    --gene-wise-disp       Use estimateDispersionsGeneEst instead of estimateDispersions
    --cores=N              Tell DESeq2 to use N cores when running, requires the
                           BiocParallel Bioconductor package to be installed [default: none]

firth logistic regression

When performing differential expression comparing two classes of samples, Firth’s logistic regression as described by Choi et al has desirable statistical properties including a better controlled type I error rate and less loss of power due to including additional variables in the model compared with other DE methods, including DESeq2. This form of logistic regression uses a penalized likelihood method to avoid the problem of complete separation of the data, a common occurence in RNASeq data. One drawback of the method is it requires more samples than DESeq2 and other negative binomial regression based methods (i.e. at least 10 replicates per condition).

A counts term must be included on the right hand side of the design formula.

detk-de firth "Status ~ AgeOfDeath + counts" norm_counts.csv column_data.csv > firth_results.csv

Usage:

Usage:
    detk-de firth [options] <design> <count_fn> <cov_fn>

Options:
    -o FILE --output=FILE  Destination of primary output [default: stdout]
    --rda=RDA              Filename passed to saveRDS() R function of the result
                           objects from the analysis
    --strict               Require that the sample order indicated by the column names in the
                           counts file are the same as, and in the same order as, the
                           sample order in the row names of the covariates file
    --standardize          Standardize counts prior to running logistic regression
                           as to obtain standardized (i.e. directly comparable)
                           beta coefficients
    --cores=N              Tell R to use N cores when running, requires the
                           parallel R package to be installed [default: none]