When I try to run notebook 2-drug-screening.ipynb inside the Hackathon-provided Docker image, it fails on this line:
```
from rpy2.robjects import pandas2ri
```
The error is:
```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 from rpy2.robjects import pandas2ri
/opt/conda/lib/python3.7/site-packages/rpy2/robjects/__init__.py in
12 import types
13 import array
---> 14 import rpy2.rinterface as rinterface
15 import rpy2.rlike.container as rlc
16
/opt/conda/lib/python3.7/site-packages/rpy2/rinterface.py in
2 import atexit
3 import typing
----> 4 from rpy2.rinterface_lib import openrlib
5 import rpy2.rinterface_lib._rinterface_capi as _rinterface
6 import rpy2.rinterface_lib.embedded as embedded
/opt/conda/lib/python3.7/site-packages/rpy2/rinterface_lib/openrlib.py in
19
20
---> 21 rlib = _dlopen_rlib(R_HOME)
22
23
/opt/conda/lib/python3.7/site-packages/rpy2/rinterface_lib/openrlib.py in _dlopen_rlib(r_home)
12 """Open R's shared C library."""
13 if r_home is None:
---> 14 raise ValueError('r_home is None. '
15 'Try python -m rpy2.situation')
16 lib_path = rpy2.situation.get_rlib_path(r_home, platform.system())
ValueError: r_home is None. Try python -m rpy2.situation
```
Inside of Jupyter the closest we can get to running "python -m rpy2.situation" is as follows:
```
import sys
import subprocess
cmnd=['python3', '-m', 'rpy2.situation']
try:
subprocess.check_output(cmnd, stderr=subprocess.PIPE)
except subprocess.CalledProcessError as e:
print('exit code: {}'.format(e.returncode))
print('stdout: {}'.format(e.output.decode(sys.getfilesystemencoding())))
print('stderr: {}'.format(e.stderr.decode(sys.getfilesystemencoding())))
```
This gives the result:
```
exit code: 1
stdout: Python version:
3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
Looking for R's HOME:
Environment variable R_HOME: None
Calling `R RHOME`: None
InstallPath in the registry: *** Only available on Windows ***
```
It seems that R is not installed in the Docker image or the RHOME variable has not been set. The [Docker description](https://github.com/Sage-Bionetworks/nf-hackathon-2019/blob/master/py_demos/Dockerfile) is as follows:
```
##Start from base jupyter notebook docker container
FROM jupyter/base-notebook
##jupyter does not have root user as default, so switch to root to use apt-get and pip3
USER root
RUN apt-get update
RUN apt-get -y install gcc
#upgrade pip and get synapseclient
RUN pip install --upgrade pip
RUN pip install synapseclient numpy pandas seaborn umap-learn matplotlib rpy2 adjustText hdbscan sklearn dfply bokeh
RUN mkdir /home/jovyan/work/output
```
This seems to rely entirely on Anaconda and rpy2 to configure a complete R environment and set RHOME. That doesn't seem to be happening.
Let's look at the [Docker description for the R version of the notebooks](https://github.com/Sage-Bionetworks/nf-hackathon-2019/blob/master/R_demos/Dockerfile):
```
## Start from this Docker image
FROM rocker/tidyverse
## use rocker as a base image
## install synapser reqs
RUN apt-get update -y
RUN apt-get install -y dpkg-dev zlib1g-dev libssl-dev libffi-dev
RUN apt-get install -y curl libcurl4-openssl-dev
## install synapser
RUN R -e "install.packages('synapser', repos=c('http://ran.synapse.org', 'http://cran.fhcrc.org'))"
RUN R -e "install.packages('synapserutils', repos=c('http://ran.synapse.org', 'http://cran.fhcrc.org'))"
## install bioconductor packages
RUN R -e "install.packages('BiocManager')"
RUN R -e "BiocManager::install(c('GSVA', 'GSEABase', 'org.Hs.eg.db', 'limma', 'GOsummaries', 'GSVAdata', 'biomaRt', 'maftools'))"
## install cran packages
RUN R -e "install.packages(c('gProfileR', 'umap', 'dbscan', 'ggfortify', 'pheatmap', 'ggpubr', 'DT', 'here', 'reshape2', 'RColorBrewer'))"
RUN mkdir /home/rstudio/output
```
We see a complex set of Synapse-specific R package installations which will not happen just by asking Anaconda to install RPY2.
We need a single Dockerfile which fully configures the R installation first and then configures the Python installation and then sets RHOME before running Jupyter.
Created by Lars Ericson lars.ericson Thanks for posting the update here!
We switched the notebooks over to feather and rebuilt the containers, so others should not encounter this problem in the containers either. NOTE: This problem was resolved with an update to the notebooks to use feather instead of rpy2. Long way around: I got this sorted. RPY2 version 2.9.4 doesn't have query method. But we are only using RPY2 to read the R dataframe. Version 2.9.4 does come with a to_csv method. So we can do this:
```
targetspath = syn.get('syn17091507')
readRDS = robjects.r['readRDS']
targets = readRDS(targetspath.path)
targets.to_csvfile('C:/Users/zot/Desktop/nf2/foo.csv', eol='\n')
targets=pd.read_csv('C:/Users/zot/Desktop/nf2/foo.csv')
targets_filt = (targets
.query('mean_pchembl > 6')
.filter(["internal_id", "hugo_gene", "std_name"])
.drop_duplicates())
pd.DataFrame.head(targets_filt)
```
So the trick is just to use RPY2 to write out the R dataframe as a CSV file, then use Pandas to read it back in as a Pandas dataframe, and then query method is available. I will have to wait for you to make the Docker work or I will need to switch my box to Ubuntu from Windows. The version of rpy2 you are using only exists in Development form, and it [does not support Windows](https://github.com/rpy2/rpy2/blob/master/rpy/situation.py), in particular there is no Windows branch in this function:
```
def get_rlib_path(r_home: str, system: str) -> str:
"""Get the path for the R shared library."""
if system == 'Linux':
lib_path = os.path.join(r_home, 'lib', 'libR.so')
elif system == 'Darwin':
lib_path = os.path.join(r_home, 'lib', 'libR.dylib')
else:
raise ValueError('The system "%s" is not supported.')
return lib_path
```
So it's Linux or Bust.
Oh, I understand now. We generally opted to use Python or R native packages in most cases (eg umap-learn vs umap or ggplot2 vs seaborn). The only reason we used rpy2 in this particular case for the python notebook was to include a dataset that was provided as an R binary file. Some of the R packages installed don't have a python equivalent or were not needed for the python notebook as we could use something different but equivalent. I was guessing that if you don't install all these R packages, it won't work as expected, since you seem to be using Python mostly as a pass-through to R (it was just a guess):
```
RUN R -e "install.packages('synapser', repos=c('http://ran.synapse.org', 'http://cran.fhcrc.org'))"
RUN R -e "install.packages('synapserutils', repos=c('http://ran.synapse.org', 'http://cran.fhcrc.org'))"
RUN R -e "install.packages('BiocManager')"
RUN R -e "BiocManager::install(c('GSVA', 'GSEABase', 'org.Hs.eg.db', 'limma', 'GOsummaries', 'GSVAdata', 'biomaRt', 'maftools'))"
RUN R -e "install.packages(c('gProfileR', 'umap', 'dbscan', 'ggfortify', 'pheatmap', 'ggpubr', 'DT', 'here', 'reshape2', 'RColorBrewer'))"
```
Hi Lars: This was not an issue a few weeks ago when the containers were created and tested (and re-tested). I can, however, reproduce this issue on my end. This suggests to me that there was a change in the parent jupyter base notebook docker image that we built our image using somewhere along the line.
>We see a complex set of Synapse-specific R package installations which will not happen just by asking Anaconda to install RPY2.
I'm not sure what you mean by this - the issue you are encountering does appear to be related to the R synapser package or the python synapseclient.
I will look into the rpy2 issue and get back to you. It might be as easy as referencing an older jupyter base-notebook.