Entering edit mode
arne.mueller@novartis.com
▴
200
@arnemuellernovartiscom-2205
Last seen 9.2 years ago
Switzerland
Hello,
I realized there's a massive performance difference to subset Granges
objects by name compared to the Granges subset method.
Example:
> length(mm9.tiled)
[1] 5309835
> n = names(mm9.tiled)
> rn = sample(n, 1000)
> system.time(tmp <- subset(mm9.tiled, names(mm9.tiled) %in% rn))
user system elapsed
1.610 0.131 1.741
> system.time(tmp <- mm9.tiled[rn])
user system elapsed
72.793 0.167 72.976
>
> sessionInfo()
R version 2.14.0 Under development (unstable) (2011-06-01 r56028)
Platform: x86_64-unknown-linux-gnu/x86_64 (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] GenomicRanges_1.5.12 IRanges_1.11.10
loaded via a namespace (and not attached):
[1] tools_2.14.0
Is this a known (wanted?) behavior?
Regards,
Arne
[[alternative HTML version deleted]]