Entering edit mode
Hi,
In support of reproducible research at my Institute, I seek an
approach to re-creating the R environments in which an analysis has
been conducted.
By which I mean, the exact version of R and the exact version of all
packages used in a particular R session.
I am seeking comments/criticism of this as a goal, and of the
following outline of an approach:
=== When all the steps to an workflow have been finalized ===
* re-run the workflow from beginning to end
* save the results of sessionInfo() into an RDS file named after the
current date and time.
=== Later, when desirous of exactly recreating this analysis ===
* read the (old) sessionInfo() into an R session
* exit with failure if the running version of R doesn't match
* compare the old sessionInfo to the currently available installed
libraries (i.e. using packageVersion)
* where there are discrepancies, install the required version of the
package (without dependencies) into new library (named after the old
sessionInfo RDS file)
Then the analyst should be able to put the new library into the front
of .libPaths and run the analysis confident that the same version of
the packages.
I have in that past used install-package-version.R to revert to
previous versions of R packages successfully
(. And there is a similar tool in
Hadley Wickhams devtools.
But, I don't know if I need something special for (BioConductor)
packages that have been installed using biocLite and seek advice here.
I do understand that the R environment is not sufficient to guarantee
reproducibility. Some of my colleagues have suggested saving a
virtual machine with all your software/library/data installed. So, I
am also in general interested in what other people are doing to this
end. But I am most interested in:
* is this a good idea
* is there a worked out solution
* does biocLite introduce special cases
* where do the dragons lurk
... and the like
Any tips?
Thanks,
~ Malcolm Cook
Stowers Institute / Computation Biology / Shilatifard Lab