Background: I am creating a new package, let's call it AnalysisPackage to be submitted to bioconductor. AnalysisPackage plots maps of the human brain colored by the enrichment or depletion of gene sets. The maps for each brain region are pretty large. In total, if I were to save all of this in sysdata.rda, the file would be ~25MB. This is far over the 4MB limit for bioconductor packages so I have generated an additional ExperimentData package, lets call it DataPackage which I now call with the ExperimentHub() function.
The problem: There are multiple functions in AnalysisPackage that require data that's stored in DataPackage. There are also nested functions in AnalysisPackage. This means that every time the end user runs a function there are 6-12 repetitive calls to ExperimentHub. This adds a frustrating amount of time to each process.
The question: Is there a way to automatically load the data from DataPackage into memory when the user runs library(AnalysisPackage) so that I don't have to continuously interact with ExperimentHub? Alternatively, has anyone found any different solution around this type of problem? The only thing that I can come up with is passing the data from one function to another, though that would create an unnecessarily large data object for the end user to have to deal with. This doesn't seem like the optimal strategy.
Thanks in advance- Sara
I read your question as trying to avoid the cost of reading the data from disk. One option is to 'memoize' data. A simple example is
The idea would be to write an (internal) helper function such as
This loads data on first use, so not all users would pay the cost of loading data. One would want to take additional precautions if this were to be used in a parallel evaluation context.
There are likely other approaches, e.g., using an .onLoad() function to load data
And then reference
.cache[["EH123"]]
in your code.Questions about package development are better addressed to the bioc-devel mailing list.
`memoize` is super neat, thanks Martin.
Yes,
memoize
is super neat!Thank you so much Martin this is a perfect answer! Also I apologize for posting to the wrong list. I'll be sure to post to bioc-devel next time.