Entering edit mode
Arnaud Amzallag
▴
100
@arnaud-amzallag-4471
Last seen 7.8 years ago
Dear IRanges developers,
runsum is a very fast and convenient function to compute on Rle
coverages, for instance. However when it is run on several chromosomes
and several samples, it can get very memory intensive. For instance on
human chromosome 1, it outputs a vector of length 250 millions, so for
several full genomes it is quickly billions of numbers in memory.
However, often you don't need a single base resolution. I wanted to
suggest, if it is possible, to add a parameter by which one could have
the sliding window to slide by a user defined step, rather than always
"step=1", as it is now. Such that runsum(myRle, k=1e4, step = 1000)
would return the equivalent of a wig file, for each 10 kilobases of
the genome, without saturating the memory of the server.
I tried with sum(Views(myRle, ir)), it is less memory intensive but it
is much slower. So that amelioration would give the best of both
worlds, fast and memory efficient.
kind regards,
Arnaud Amzallag
Research Fellow
Mass general Cancer Center / Harvard Medical school