Hi everyone:
I ran into issue when I tried to split relatively big GRanges objects by given threshold value, and my approach very slow to reach expected result. My approach could work fast if I used data.frame, but I trust the GRanges object could work well dealing with genomic interval. Can anyone give me possible suggestion to speed up splitting GRanges objects relatively fast? How can make this happen? Any idea ?
> length(gr) [1] 36678
I tried this way :
lapply(gr, function(x) split(x, c("keep", "saved")[(x$p.value <= 1e-08)+1]))
but doing this way is so slow and output format is undesired, so this motivate me to find out other approach. Which way I can facilitate above process? what if GRanges objects unexpectedly large, and still need to split up by comparing its metadata with given threshold, what should I do ? Any help is appreciated.
my expected output could be list (the skeleton of the output):
gr gr$keep gr$saved
Here is the session info:
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] metap_0.7 rtracklayer_1.33.12 GenomicRanges_1.25.94 GenomeInfoDb_1.9.14
[5] IRanges_2.7.17 S4Vectors_0.11.18 BiocGenerics_0.19.2
loaded via a namespace (and not attached):
[1] lattice_0.20-33 XML_3.98-1.4 Rsamtools_1.25.2
[4] Biostrings_2.41.4 bitops_1.0-6 GenomicAlignments_1.9.6
[7] grid_3.3.1 zlibbioc_1.19.0 XVector_0.13.7
[10] Matrix_1.2-6 BiocParallel_1.7.9 tools_3.3.1
[13] Biobase_2.33.4 RCurl_1.95-4.8 SummarizedExperiment_1.3.82
This works quite well, I didn't expect this gonna be good try. Thank you Michael :)