Generic functions for DataFrames of Rle objects ?
1
0
Entering edit mode
@charles-plessy-7857
Last seen 14 months ago
Japan

Dear BioC developers and community,

I am using more and more DataFrames of Rle values, typically for transcriptome expression data, and I end up writing more and more functions that take a DataFrame, lapply a function that unpack the Rle, apply a second function, repack the Rle and convert the resulting list in a DataFrame.  I was just wondering (actually, searched and did not find) if there are already classes or packages providing such a functionality, or provide methods such as colSums, rowsum, cor, etc, adapted to be efficient in that context.

Have a nice day !

DataFrame Rle generic expression table • 1.7k views
ADD COMMENT
2
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi Charles,

FWIW the DelayedArray package allows you to manipulate a DataFrame of Rle columns as a matrix-like object by just wrapping it inside a DelayedArray object (you do this by calling DelayedArray() on it). See ?DelayedArray for more information. After applying (delayed) operations on it, you can turn the DelayedArray object back into a DataFrame of Rle columns by just coercing it to DataFrame (i.e. with as(  , "DataFrame")). Note that I just added the coercion method to DataFrame in DelayedArray 0.2.5. This new version of the package should become available via biocLite() in 48h or less.

Hope this helps,

H.

ADD COMMENT
0
Entering edit mode

Thanks a lot Hervé! It took me some time to understand the obvious, but the DelayedArray wrappers are exactly what I needed.

Would you recommend to I wrap in DelayedArrays just before performing matrix-like operations, or to use the DelayedArray class as the base class for the assays in the SummarizedExperiment objects that I produce ?

(The background of my questions is that I am refactoring the CAGEr package to use MultiAssayExperiments, SummarizedExperiments and DataFrames of Rles extensively).

ADD REPLY
0
Entering edit mode

Hi Hervé, I have been using rowSums(DelayedArray(DF)) for almost 6 years now, but this week I got curious about performance and did a benchmark. Interestingly, it is much faster to decode the values and sum them than to wrap the DataFrame in a DelayedArray, or to sum the Rle values without decoding them. I hope it can be useful to you and others. Interestingly, ChatGPT did not give working code because it confused runValue and decode...

ADD REPLY
0
Entering edit mode

Hi Charles,

Thanks for the feedback. Operating _natively_ on the DF of Rle objects will always be more efficient than wrapping the object first in a DelayedArray object. The latter is only a quick and easy way to expedite things by getting access to all the operations supported by DelayedArray objects in general. However nothing replaces operations that are implemented to work directly on a specific type of DelayedArray seed.

Note that these "native operations" must be careful to avoid expanding all the Rle's in the DF _at once_. This is easy to do with rowSums(), but is sometimes a little bit less straightforward like in the case of rowVars().

Best,

H.

ADD REPLY

Login before adding your answer.

Traffic: 700 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6