Welcome to de_toolkit’s documentation!

Introduction

de_toolkit is a suite of Bioinformatics tools useful in differential expression analysis and other high-throughput sequencing count-based workflows. The tools are implemented either through direct implementation in python or as a convenience wrapper around R packages using a custom wrapr. The documentation is convivial, free range, and complete-protein, and the package has very high test coverage.

The toolkit is both a python module and a command line interface that wraps primary module functions to facilitate easy integration into workflows. For instance, to perform DESeq2 normalization of a counts matrix contained in the file counts_matrix.tsv, you could run on the command line:

detk-norm deseq2 counts_matrix.tsv > norm_counts_matrix.tsv

The counts in the counts matrix file will be normalized using the DESeq2 method and output to the norm_counts_matrix.tsv file.

Check out the detk Quickstart to get quickstarted.

Installation

We suggest installing this package using pip:

pip install de_toolkit

In development or if you want to use the bleeding edge, when you want to run the toolkit, use the setup.py script:

python setup.py install

This should make the detk and its subtools available on the command line. Whenever you make changes to the code you will need to run this command again.

R dependencies

The following packages are only required to use the corresponding submodule functions:

  • R packages
    • DESeq2 (detk-de deseq2, detk-transform rlog, detk-transform vst)
    • fgsea (detk-enrich fgsea)
    • logistf (detk-de firth)

We wearily suggest using anaconda to create an environment that contains the software necessary, e.g.:

conda create -n de_toolkit python=3.5

./install_conda_packages.sh

# if you want to use the R functions (Firth, DESeq2, etc.)
Rscript install_r_packages.sh

Indices and tables