wrapr
- Thin wrapper for running R scripts¶
Thin wrapper interface for running R scripts from detk. This is a replacement for rpy2, which is a heavy dependency fraught with danger and hardship.
Note
This module is mostly intended for internal use by detk when interacting with R. A CLI interface is provided because why not, but is only intended to be used in advanced cases when you want to commandline-ize an R script that fits into the interface. If you have a one-off R script that needs to be integrated into your workflow, it would probably be better to just write it in R. Caveat emptor.
Setup¶
The wrapr
interface assumes that R and any necessary packages have been
already installed by the user. If you are using conda, you can install R
easily with:
$ conda install -c r r-base
Once installed, wrapr
also requires the jsonlite R package to be
installed:
$ R
> install.packages("jsonlite")
To verify that your environment is properly set up to use wrapr
, run:
$ detk wrapr check
R found: True
R path: /usr/bin/Rscript
jsonlite found: True
The interface¶
wrapr
implements a well-defined interface between detk and R through a
bridge script. From the command line, the following inputs are possible:
$ detk-wrapr run \
--meta-in=/path/to/metadata \ # metadata filename corresponding to input counts
--meta-out=/path/to/metadata_out \ # filename where modified metadata should be written
--params-in=/path/to/params.json \ # JSON formatted file with parameters needed by R
--params-out=/path/to/output_params.json \ # filename where parameters/values can be passed back out of R
/path/to/rscript \ # R script written to use the interface
/path/to/input_counts \ # counts matrix
/path/to/output \ # filename where tabular output should be written
Arguments starting with --
are optional. Input metadata and counts should
be tabular as accepted elsewhere by detk. The input parameters file should be
JSON formatted, containing an object with fields that are mapped directly to
R list
members.
The bridge script makes the following variables available in the R environment where the R script is run:
- counts.fn: path to the counts filename provided to
detk-wrapr
- out.fn: path to the file where new counts will be written after
R has operated on them, the user is expected to write to this file e.g.
write.csv(counts.mat, counts.out.fn)
- params: an R
list
that contains parameter values as included in the parameter JSON file
For example, say we wanted to write an R script that added a configurable pseudocount to every count in a counts matrix. We could write the JSON parameter file as follows:
{
"pseudocount": 1
}
And write the following R script, named pseudocount.R
:
# counts.fn, params, and out.fn are already defined
mat <- read.csv(counts.fn,rownames=1,colnames=1)
new.mat <- mat + params$pseudocount
write.csv(new.mat,out.fn)
The command to execute this wrapr code might be:
$ detk wrapr run --params-in=pseudocount_params.json \
pseudocount.R counts.csv counts_plus_pseudo.csv
The file counts_plus_pseudo.csv
will contain the result of the R script
operation.
API documentation¶
-
class
de_toolkit.wrapr.
WrapR
(rscript_path, counts=None, metadata=None, params=None, output_fn=None, metadata_out_fn=None, params_out_fn=None, rpath=None, raise_on_error=True, routput_dir=None)[source]¶ Wrapper object for calling R code with Rscript.
Note
The attributes are only populated after the execute() method has been run
Parameters: - rscript_path (str) – path to the R script to run
- counts (pandas.DataFrame, optional) – dataframe containing counts to be passed to R
- metadata (pandas.DataFrame, optional) – dataframe containing metadata to be passed to R
- params (dict, optional) – dict of parameters to be passed to R
- output_fn (str, optional) – path to file where R should write output, if not provided the output is written to a temporary file and deleted upon WrapR object deletion
- metadata_out_fn (str, optional) – path to file where R should write metadata output
- rpath (str) – path to the Rscript executable, taken from the PATH environment variable if None
- raise_on_error (bool) – raise an exception if R encounters an error, other wise fail silently and deadly
-
output
¶ pandas.DataFrame – dataframe of the tabular output created by R script
-
metadata_out
¶ pandas.DataFrame – dataframe of the tabular metadata output created by R script
-
params_out
¶ dict – dict of the output parameters list created by R script
-
stdout
¶ str – string capturing the standard output of the R script
-
stderr
¶ str – string capturing the standard error of the R script
-
retcode
¶ int – return code of the R process
-
success
¶ bool – True if retcode == 0
Raises: de_toolkit.wrapr.RExecutionError
– when raise_on_error is True, raise whenever R encounters an errorExamples
Basic usage accepts a path to an R script and loads the content of the file pointed to by out.fn in the R script into the output attribute:
>>> with open('script.R','wt') as f : # note reference to implicitly defined *out.fn* # R variable f.write('write.csv(c(1,2,3,4),out.fn)') >>> r = WrapR('script.R',output_fn='test.csv') >>> r.execute() >>> r.output x 1 1 2 2 3 3 4 4 >>> pandas.read_csv('test.csv',index_col=0) x 1 1 2 2 3 3 4 4
Can also use a context manager when the output doesn’t need to be written to a named file:
>>> with WrapR('script.R') as r : r.execute() print(r.output) x 1 1 2 2 3 3 4 4
The standard output of the R script can be accessed with the stdout attribute:
>>> with open('euler.R','wt') as f : f.write('exp(complex(real=0,imag=pi))+1') >>> with WrapR('euler.R','wt') as r : r.execute() print(r.stdout) [1] 0+1.224647e-16i
-
de_toolkit.wrapr.
wrapr
(Rcode, **kwargs)[source]¶ Convenience wrapper for WrapR object. Writes Rcode to a temporary file and executes it as it would if it were provided.
Parameters: Rcode (str) – string containing valid R code to be executed Returns: A WrapR object executed with the code in input string Return type: obj Examples
>>> with wrapr('write.csv(c(1,2,3,4),out.fn)') as r : print(r.output) x 1 1 2 2 3 3 4 4