Simple process for sharing R code via Docker
1 Introduction
Things that can differ between team members the version of R, the operating system, packages, libraries, environment variables… !
Lets assume you have an R
code program, say peng.R
, that you’re written to analyze some data. You now want to share the code with a colleague, we’ll call him Joe. How to proceed?
The simplest option is simply to send Joe the “R” file containing the code via the most convenient method (e.g. email/text/slack/discord/github/USB drive etc.)
The next step will be for Joe to ( attempt to ) load and run the code. Typically he would do this either using the IDE Rstudio.app
to open the file and knit
it, or run it from the command line with the command:
> R -e "source('fig.R)"
Sometimes this approach works. Joe can add comments or expand the code and reply to you, and all is well. Frequently, however, this naive process will fail for any number of reasons. Even when it runs its not guaranteed that Joe will get the same results you did. Issues such as package versioning, the version of R, the compiler settings when R was compiled, .Rprofile settings can all impact results.
Ideally to facilitate reproducibility Joe will have as similar a computing environment as you, the original developer. This can be difficult to achieve, especially given the dynamic nature of open source software. For example Joe may have an outdated version of R
installed on his workstation, or his R
environment may be missing a necessary package. Additiional potential problems include: the required package may be present but its the wrong version, the program may need to source an additional file thats missing, or the program load some data that it can’t find on Joe’s machine.
All of these problems go away if instead of sending the program as a standalone text file you send it as a docker image. In this post we’ll walk through the process of dockerizing the R code.
Assume a simple Rmd
file like this:
zz.sum.max <- function(x, extra = F) {
if (!is.numeric(x)) {
cat("Error: Numeric arrays required.")
return(1)
}
x <- x[!is.na(x)]
N <- length(x)
options(digits = 4)
if (!N) {
return(c(0))
}
Mean <- mean(x)
V <- var(x)
SD <- V^.5
SE <- V^.5 / (N^.5)
Min <- min(x)
Median <- quantile(x, probs = c(.5), names = F)
Max <- max(x)
errmin2 <- Mean - 1.96 * SE
errmax2 <- Mean + 1.96 * SE
errmin1 <- Mean - SE
errmax1 <- Mean + SE
if (extra) {
return(c(N = round(N), Mean = Mean, SE = SE, ERRMN1 = errmin1, ERRMX1 = errmax1, ERRMN2 = errmin2, ERRMX2 = errmax2, SD = SD, min = Min, max = Max))
} else {
return(c(N = round(N), Mean = Mean, Median = Median, SE = SE, SD = SD, min = Min, max = Max))
}
}
dat1 <- read.csv("orth.csv")
out <- zz.sum.max(dat1$age)
print(out)
3 Docker approach
Alternatively, consider the “Docker” approach.
Before sending peng.Rmd to Joe we’ll dockerize it.
- Prepare a work directory: penguins. We want to send Joe a container that has R and all the preliminaries taken care of so that all he has to do is
Here is the docker file
FROM rhub/r-minimal
ENV MRAN_BUILD_DATE=2024-02-01 # Install Basic Utility R Packages
RUN installr -r https://cran.microsoft.com/snapshot/${MRAN_BUILD_DATE} \
--error -d -t "zlib-dev" shiny
RUN addgroup --system joe && adduser --system --ingroup joe joe
RUN chown joe:joe -R /home/joe
USER joe
WORKDIR /home/joe
RUN mkdir -p /home/joe/shr
RUN mkdir -p /home/joe/output
COPY fig.R /home/joe/shr
COPY orth.csv /home/joe/shr
CMD ["/bin/bash"]
run docker
docker build -t rgt47/penguin_review --platform=linux/amd64 .
docker push rgt47/peng_review
relay image to Joe
docker push rgt47/peng_review
or
docker save rgt47/peng_review | gzip > peng_review_trans.tgz
docker load -i peng_review_trans.tgz
> docker pull rgt47/penguin_review
> droot="$PWD"/output docker run -it --rm --platform linux/x86_64 \
$droot:/home/joe/output peng_review
-v > cd output
> library(rmarkdown); render('../shr/peng.Rmd')
Important to include the association between the /home/joe/output directory in the container with the output directory on the local workstation. Thats where the results of the analysis will be saved.
> R -e "library(rmarkdown); render('peng.Rmd')"
and if he wants to edit peng.Rmd
> vim peng.Rmd
\usepackage[export]{adjustbox}
\usepackage{fancyhdr}
\usepackage{titling}
\pagestyle{fancy}
\pretitle{
\begin{flushright}
\includegraphics[width=3cm,valign=c]{sudoku.png}\\
\end{flushright}
\begin{flushleft} \LARGE }
\posttitle{\par\end{flushleft}\vskip 0.5em}
\predate{\begin{flushleft}\large}
\postdate{\par\end{flushleft}}
\preauthor{\begin{flushleft}\large}
\postauthor{\par\end{flushleft}}
\fancyfoot[L]{\currfilename} %put date in header
\fancyfoot[R]{\includegraphics[width=.8cm]{sudoku.png}}
\fancyhead[L]{\today} %put current file in footer
4 REFERENCES
Reuse
Citation
@online{(ryy) glenn thomas2024,
author = {(Ryy) Glenn Thomas, Ronald},
title = {Simple Process for Sharing {R} Code via {Docker}},
date = {2024-02-24},
url = {https://focusonr.org/posts/share_R_code_via_docker},
langid = {en}
}