Entering edit mode
Arnaud Amzallag
▴
100
@arnaud-amzallag-4471
Last seen 7.8 years ago
Hello Hervé and Michael,
On Jun 29, 2011, at 6:29 PM, Michael Lawrence wrote:
>
>
> 2011/6/28 Hervé Pagès <hpages@fhcrc.org>
> Hi Michael, Arnaud,
>
>
> On 11-05-25 08:21 AM, Arnaud Amzallag wrote:
> Thank you Michael,
>
> for some email filters reasons I saw your reply only now.
>
> I recon that in my case that would have been much smoother if sum()
would
> call viewSums() by default and I agree that "It's more intuitive to
think of
> an RleViews like an RleList rather than an IntegerList.". I would
support
> that change.
>
> I've recently made some changes in that direction in IRanges devel.
> This subtle shift in the nature of an RleViews (from IntegerList
> to RleList) has some deep consequences on the hierarchy of classes
> in IRanges. The most important one is that the Views class doesn't
> extend the IRanges class anymore. Views is now a direct subclass
> of List. This means that you cannot directly manipulate a Views
object
> as if it was an IRanges object but you must extract its ranges (with
> ranges()) first.
>
>
> This is pretty unfortunate. We lose the ability to e.g. store the
coverage inside RangedData (because a ViewsList is a RangesList). Is
it really necessary for what was proposed?
>
I don't really have a point of view on this. Although I am curious,
are you keeping coverage and annotation data in the same object,
Michael ? Does it have some advantage ? I am simply constructing the
Views with the annotation each time I need it.
>
> This is a work-in-progress and a few things still need to be
polished.
>
> Also I agree that we should just have "min", "max", "sum", "mean"
etc
> methods for Views objects. No need for viewMins, viewMaxs, viewSums
etc
> I'll change this.
>
Thank you Hervé, this is probably a good idea.
Thank you all for the good work !
Arnaud
> Cheers,
> H.
>
>
> Also it is possible that before I was summing the values of the Rle
and did
> not notice the difference because my Rle was made of a lot of very
short Rle
> lengths.
>
> Arnaud
>
> On Tue, May 10, 2011 at 8:44 AM, Michael
Lawrence<lawrence.michael@gene.com> wrote:
>
> Good to hear that helped. One might expect sum() to simply call
viewSums(),
> but the semantics are a bit strange here. The reason sum() works on
Views is
> that a Views is a Ranges and thus an IntegerList (where each range
encodes a
> sequence of integers). The weird thing is that the elements of a
Views are
> not the sequence of integers covered but rather the values in the
Rle. That
> everything works as you expected is just a coincidence of dispatch.
>
> For usability we should probably have max(), min(), and sum() just
use
> viewMaxs, viewMins and viewSums. It's more intuitive to think of an
RleViews
> like an RleList rather than an IntegerList.
>
>
> On Sun, May 8, 2011 at 1:44 PM, Arnaud
Amzallag<arnaud.amzallag@gmail.com> wrote:
>
> Thank you Michael, the function viewSums was exactly what I needed !
>
> 0.014 seconds for viewSums(Views(myrle, ir)) vs 54 seconds for
> sum(Views(myrle, ir)) on chr22, one sample. I use this now instead
of of
> runsum, no problem of memory, and probably even faster. for full the
genome
> on many samples that will surely help. Maybe I should have read a
bit more
> about the Views.
>
> About the result of runsum, I did see a lot of memory usage when I
split
> the process with mclapply. The result is indeed a Rle. After looking
closer,
> the resuting Rle has much more runs that the original one. That
makes sense,
> because runsum is a kind of smoothing function, and the resulting
signal has
> much more levels than the original one.
>
> Kind regards,
>
> Arnaud
>
> On May 6, 2011, at 10:42 PM, Michael Lawrence wrote:
>
>
>
> On Fri, May 6, 2011 at 2:54 PM, Arnaud Amzallag<
> arnaud.amzallag@gmail.com> wrote:
>
> Dear IRanges developers,
>
> runsum is a very fast and convenient function to compute on Rle
> coverages, for instance. However when it is run on several
chromosomes and
> several samples, it can get very memory intensive. For instance on
human
> chromosome 1, it outputs a vector of length 250 millions, so for
several
> full genomes it is quickly billions of numbers in memory.
>
>
> I would have expected the result to be an Rle, which would be fairly
> memory efficient.
>
>
> However, often you don't need a single base resolution. I wanted to
> suggest, if it is possible, to add a parameter by which one could
have the
> sliding window to slide by a user defined step, rather than always
"step=1",
> as it is now. Such that runsum(myRle, k=1e4, step = 1000) would
return the
> equivalent of a wig file, for each 10 kilobases of the genome,
without
> saturating the memory of the server.
>
> I tried with sum(Views(myRle, ir)), it is less memory intensive but
it is
> much slower. So that amelioration would give the best of both
worlds, fast
> and memory efficient.
>
>
> Have you tried viewSums(Views(myRle, ir))?
>
>
> kind regards,
>
> Arnaud Amzallag
> Research Fellow
> Mass general Cancer Center / Harvard Medical school
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages@fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
[[alternative HTML version deleted]]