Welcome to de_toolkit’s documentation!¶
Introduction¶
de_toolkit
is a suite of Bioinformatics tools useful in differential
expression analysis and other high-throughput sequencing count-based workflows.
The tools are implemented either through direct implementation in python or as
a convenience wrapper around R packages using rpy2.
The toolkit is both a python module and a command line interface that wraps
primary module functions to facilitate easy integration into workflows. For
instance, to perform DESeq2 normalization of a counts matrix contained in the
file counts_matrix.tsv
, you could run on the command line:
detk-norm deseq counts_matrix.tsv > norm_counts_matrix.tsv
The counts in the counts matrix file will be normalized using the DESeq2 method
and output to the norm_counts_matrix.tsv
file as the equivalent of:
library(DESeq2)
# TODO: add actual equivalent code
Module Documentation¶
norm
- Normalizing Count Matricesde
- Differential Expression- Patsy-lite
outlier
- Outlier Identificationtransform
- Count Transformationfilter
- Filtering Count Matricesstats
- Count Matrix Statistics- JSON output format
summary
- Summary Statisticsbase
- Basic statisticscoldist
- Column-wise distribution of countsrowdist
- Row-wise distribution of countscolzero
- Column-wise distribution of zero countsrowzero
- Row-wise distribution of zero countsentropy
- Row-wise sample entropy calculationpca
- Principal Component Analysis
The following functionality is (or will be) implemented by the package (items in italics are not yet implemented):
- norm - Normalizing Count Matrices
- DESeq2
- trimmed mean
- reference norm
- library size
- FPKM
- user supplied
- de - Differential Expression
- DESeq2
- Firth’s Logistic Regression
- t-test
- outlier - Outlier Identification
- entropy
- Cook’s distance
- transform - Count Transformation
- DESeq2 Variance Stabilizing Transform
- RUVSeq transformation
- trim
- shrink
- filter - Filtering Count Matrices
- nonzero
- mean
- stats - Count Matrix Statistics
- summary
- dist
- PCA
Installing¶
conda package¶
We suggest installing this package using anaconda on the bubhub channel:
conda install -c bubhub de_toolkit
Manual installation¶
If conda is not available, ensure the following packages are installed and available in your environment:
- python packages (python>=3.5)
- docopt
- pandas
- numpy
- R packages (R>=3.2)
- R>=3.2
- docopt
The following packages are only required to use the corresponding submodule functions:
- R packages
- DESeq2 (bioconductor)
- RUVSeq (bioconductor)
- logistf (CRAN)
We suggest using anaconda to create an environment that contains the software necessary, e.g.:
conda create -n de_toolkit python=3.5
./install_conda_packages.sh
# if you want to use the R functions (Firth, DESeq2, etc.)
Rscript install_r_packages.sh
In development, when you want to run the toolkit, use the setup.py
script:
python setup.py install
This should make the detk
and its subtools available on the command line. Whenever you make changes
to the code you will need to run this command again.