Entering edit mode
Hi Michael, Arnaud,
On 11-05-25 08:21 AM, Arnaud Amzallag wrote:
> Thank you Michael,
>
> for some email filters reasons I saw your reply only now.
>
> I recon that in my case that would have been much smoother if sum()
would
> call viewSums() by default and I agree that "It's more intuitive to
think of
> an RleViews like an RleList rather than an IntegerList.". I would
support
> that change.
I've recently made some changes in that direction in IRanges devel.
This subtle shift in the nature of an RleViews (from IntegerList
to RleList) has some deep consequences on the hierarchy of classes
in IRanges. The most important one is that the Views class doesn't
extend the IRanges class anymore. Views is now a direct subclass
of List. This means that you cannot directly manipulate a Views object
as if it was an IRanges object but you must extract its ranges (with
ranges()) first.
This is a work-in-progress and a few things still need to be polished.
Also I agree that we should just have "min", "max", "sum", "mean" etc
methods for Views objects. No need for viewMins, viewMaxs, viewSums
etc
I'll change this.
Cheers,
H.
>
> Also it is possible that before I was summing the values of the Rle
and did
> not notice the difference because my Rle was made of a lot of very
short Rle
> lengths.
>
> Arnaud
>
> On Tue, May 10, 2011 at 8:44 AM, Michael Lawrence<lawrence.michael at="" gene.com="">> wrote:
>
>> Good to hear that helped. One might expect sum() to simply call
viewSums(),
>> but the semantics are a bit strange here. The reason sum() works on
Views is
>> that a Views is a Ranges and thus an IntegerList (where each range
encodes a
>> sequence of integers). The weird thing is that the elements of a
Views are
>> not the sequence of integers covered but rather the values in the
Rle. That
>> everything works as you expected is just a coincidence of dispatch.
>>
>> For usability we should probably have max(), min(), and sum() just
use
>> viewMaxs, viewMins and viewSums. It's more intuitive to think of an
RleViews
>> like an RleList rather than an IntegerList.
>>
>>
>> On Sun, May 8, 2011 at 1:44 PM, Arnaud Amzallag<arnaud.amzallag at="" gmail.com="">>> wrote:
>>
>>> Thank you Michael, the function viewSums was exactly what I needed
!
>>>
>>> 0.014 seconds for viewSums(Views(myrle, ir)) vs 54 seconds for
>>> sum(Views(myrle, ir)) on chr22, one sample. I use this now instead
of of
>>> runsum, no problem of memory, and probably even faster. for full
the genome
>>> on many samples that will surely help. Maybe I should have read a
bit more
>>> about the Views.
>>>
>>> About the result of runsum, I did see a lot of memory usage when I
split
>>> the process with mclapply. The result is indeed a Rle. After
looking closer,
>>> the resuting Rle has much more runs that the original one. That
makes sense,
>>> because runsum is a kind of smoothing function, and the resulting
signal has
>>> much more levels than the original one.
>>>
>>> Kind regards,
>>>
>>> Arnaud
>>>
>>> On May 6, 2011, at 10:42 PM, Michael Lawrence wrote:
>>>
>>>
>>>
>>> On Fri, May 6, 2011 at 2:54 PM, Arnaud Amzallag<
>>> arnaud.amzallag at gmail.com> wrote:
>>>
>>>> Dear IRanges developers,
>>>>
>>>> runsum is a very fast and convenient function to compute on Rle
>>>> coverages, for instance. However when it is run on several
chromosomes and
>>>> several samples, it can get very memory intensive. For instance
on human
>>>> chromosome 1, it outputs a vector of length 250 millions, so for
several
>>>> full genomes it is quickly billions of numbers in memory.
>>>>
>>>>
>>> I would have expected the result to be an Rle, which would be
fairly
>>> memory efficient.
>>>
>>>
>>>> However, often you don't need a single base resolution. I wanted
to
>>>> suggest, if it is possible, to add a parameter by which one could
have the
>>>> sliding window to slide by a user defined step, rather than
always "step=1",
>>>> as it is now. Such that runsum(myRle, k=1e4, step = 1000) would
return the
>>>> equivalent of a wig file, for each 10 kilobases of the genome,
without
>>>> saturating the memory of the server.
>>>>
>>>> I tried with sum(Views(myRle, ir)), it is less memory intensive
but it is
>>>> much slower. So that amelioration would give the best of both
worlds, fast
>>>> and memory efficient.
>>>>
>>>>
>>> Have you tried viewSums(Views(myRle, ir))?
>>>
>>>
>>>> kind regards,
>>>>
>>>> Arnaud Amzallag
>>>> Research Fellow
>>>> Mass general Cancer Center / Harvard Medical school
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>
>>>
>>>
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319