wrapr - Thin wrapper for running R scripts

Thin wrapper interface for running R scripts from detk. This is a replacement for rpy2, which is a heavy dependency fraught with danger and hardship.

Note

This module is mostly intended for internal use by detk when interacting with R. A CLI interface is provided because why not, but is only intended to be used in advanced cases when you want to commandline-ize an R script that fits into the interface. If you have a one-off R script that needs to be integrated into your workflow, it would probably be better to just write it in R. Caveat emptor.

Setup

The wrapr interface assumes that R and any necessary packages have been already installed by the user. If you are using conda, you can install R easily with:

$ conda install -c r r-base

Once installed, wrapr also requires the jsonlite R package to be installed:

$ R
> install.packages("jsonlite")

To verify that your environment is properly set up to use wrapr, run:

$ detk wrapr check
R found: True
R path: /usr/bin/Rscript
jsonlite found: True

The interface

wrapr implements a well-defined interface between detk and R through a bridge script. From the command line, the following inputs are possible:

$ detk-wrapr run \
  --meta-in=/path/to/metadata \ # metadata filename corresponding to input counts
  --meta-out=/path/to/metadata_out \ # filename where modified metadata should be written
  --params-in=/path/to/params.json \ # JSON formatted file with parameters needed by R
  --params-out=/path/to/output_params.json \ # filename where parameters/values can be passed back out of R
  /path/to/rscript \ # R script written to use the interface
  /path/to/input_counts \ # counts matrix
  /path/to/output \ # filename where tabular output should be written

Arguments starting with -- are optional. Input metadata and counts should be tabular as accepted elsewhere by detk. The input parameters file should be JSON formatted, containing an object with fields that are mapped directly to R list members.

The bridge script makes the following variables available in the R environment where the R script is run:

  • counts.fn: path to the counts filename provided to detk-wrapr
  • out.fn: path to the file where new counts will be written after R has operated on them, the user is expected to write to this file e.g. write.csv(counts.mat, counts.out.fn)
  • params: an R list that contains parameter values as included in the parameter JSON file

For example, say we wanted to write an R script that added a configurable pseudocount to every count in a counts matrix. We could write the JSON parameter file as follows:

{
  "pseudocount": 1
}

And write the following R script, named pseudocount.R:

# counts.fn, params, and out.fn are already defined
mat <- read.csv(counts.fn,rownames=1,colnames=1)
new.mat <- mat + params$pseudocount
write.csv(new.mat,out.fn)

The command to execute this wrapr code might be:

$ detk wrapr run --params-in=pseudocount_params.json \
pseudocount.R counts.csv counts_plus_pseudo.csv

The file counts_plus_pseudo.csv will contain the result of the R script operation.

API documentation

class de_toolkit.wrapr.WrapR(rscript_path, counts=None, metadata=None, params=None, output_fn=None, metadata_out_fn=None, params_out_fn=None, rpath=None, raise_on_error=True, routput_dir=None)[source]

Wrapper object for calling R code with Rscript.

Note

The attributes are only populated after the execute() method has been run

Parameters:
  • rscript_path (str) – path to the R script to run
  • counts (pandas.DataFrame, optional) – dataframe containing counts to be passed to R
  • metadata (pandas.DataFrame, optional) – dataframe containing metadata to be passed to R
  • params (dict, optional) – dict of parameters to be passed to R
  • output_fn (str, optional) – path to file where R should write output, if not provided the output is written to a temporary file and deleted upon WrapR object deletion
  • metadata_out_fn (str, optional) – path to file where R should write metadata output
  • rpath (str) – path to the Rscript executable, taken from the PATH environment variable if None
  • raise_on_error (bool) – raise an exception if R encounters an error, other wise fail silently and deadly
output

pandas.DataFrame – dataframe of the tabular output created by R script

metadata_out

pandas.DataFrame – dataframe of the tabular metadata output created by R script

params_out

dict – dict of the output parameters list created by R script

stdout

str – string capturing the standard output of the R script

stderr

str – string capturing the standard error of the R script

retcode

int – return code of the R process

success

bool – True if retcode == 0

Raises:de_toolkit.wrapr.RExecutionError – when raise_on_error is True, raise whenever R encounters an error

Examples

Basic usage accepts a path to an R script and loads the content of the file pointed to by out.fn in the R script into the output attribute:

>>> with open('script.R','wt') as f :
        # note reference to implicitly defined *out.fn*
        # R variable
        f.write('write.csv(c(1,2,3,4),out.fn)')
>>> r = WrapR('script.R',output_fn='test.csv')
>>> r.execute()
>>> r.output
   x
1  1
2  2
3  3
4  4
>>> pandas.read_csv('test.csv',index_col=0)
   x
1  1
2  2
3  3
4  4

Can also use a context manager when the output doesn’t need to be written to a named file:

>>> with WrapR('script.R') as r :
        r.execute()
        print(r.output)
   x
1  1
2  2
3  3
4  4

The standard output of the R script can be accessed with the stdout attribute:

>>> with open('euler.R','wt') as f :
        f.write('exp(complex(real=0,imag=pi))+1')
>>> with WrapR('euler.R','wt') as r :
        r.execute()
        print(r.stdout)
[1] 0+1.224647e-16i
de_toolkit.wrapr.wrapr(Rcode, **kwargs)[source]

Convenience wrapper for WrapR object. Writes Rcode to a temporary file and executes it as it would if it were provided.

Parameters:Rcode (str) – string containing valid R code to be executed
Returns:A WrapR object executed with the code in input string
Return type:obj

Examples

>>> with wrapr('write.csv(c(1,2,3,4),out.fn)') as r :
        print(r.output)
   x
1  1
2  2
3  3
4  4
de_toolkit.wrapr.get_r_path()[source]

Return the path to Rscript found in the shell environment.

de_toolkit.wrapr.check_r()[source]

Tests whether the Rscript executable can be found.

de_toolkit.wrapr.check_r_package(pkg)[source]

Tests whether the R package pkg is installed.