Question

Any way to speed up splitting relatively big GRanges objects by given threshold ?

0

Entering edit mode

Jurat Shahidin ▴ 80

@jurat-shahidin-9488

Last seen 5.0 years ago

Chicago, IL, USA

Hi everyone:

I ran into issue when I tried to split relatively big GRanges objects by given threshold value, and my approach very slow to reach expected result. My approach could work fast if I used data.frame, but I trust the GRanges object could work well dealing with genomic interval. Can anyone give me possible suggestion to speed up splitting GRanges objects relatively fast? How can make this happen? Any idea ?

> length(gr)
[1] 36678

I tried this way :

lapply(gr, function(x) split(x, c("keep", "saved")[(x$p.value <= 1e-08)+1]))

but doing this way is so slow and output format is undesired, so this motivate me to find out other approach. Which way I can facilitate above process? what if GRanges objects unexpectedly large, and still need to split up by comparing its metadata with given threshold, what should I do ? Any help is appreciated.

my expected output could be list (the skeleton of the output):

gr
 gr$keep
 
 gr$saved

Here is the session info:

> sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] metap_0.7 rtracklayer_1.33.12 GenomicRanges_1.25.94 GenomeInfoDb_1.9.14 [5] IRanges_2.7.17 S4Vectors_0.11.18 BiocGenerics_0.19.2

loaded via a namespace (and not attached): [1] lattice_0.20-33 XML_3.98-1.4 Rsamtools_1.25.2 [4] Biostrings_2.41.4 bitops_1.0-6 GenomicAlignments_1.9.6 [7] grid_3.3.1 zlibbioc_1.19.0 XVector_0.13.7 [10] Matrix_1.2-6 BiocParallel_1.7.9 tools_3.3.1 [13] Biobase_2.33.4 RCurl_1.95-4.8 SummarizedExperiment_1.3.82

r granges split performance • 1.2k views

ADD COMMENT • link updated 8.4 years ago by Michael Lawrence ★ 11k • written 8.4 years ago by Jurat Shahidin ▴ 80

score 2 · Accepted Answer · 2016-11-03

2

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.3 years ago

United States

Why not just do:

split(gr, ifelse(gr$p.value <= 1e-08, "saved", "keep"))

ADD COMMENT • link 8.4 years ago Michael Lawrence ★ 11k

0

Entering edit mode

This works quite well, I didn't expect this gonna be good try. Thank you Michael :)

ADD REPLY • link 8.4 years ago Jurat Shahidin ▴ 80