Hi all,
If I have an ordered vector of genomic positions on chr1
> head(positions) [1] 11161 12412 12462 12687 12830 13316 > length(positions) [1] 576356
The distances between consecutive positions is represented by
> d <- positions[-1] - positions[-length(positions)] > head(d) [1] 1251 50 225 143 486 105
I want to enumerate all possible pairs of genomic locations such that the linear distance between the two points is <= D. I want an alternative to combn function from base package. What is fastest and least memory intensity way to do this in R to do this?
sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 data.table_1.9.4 [3] BSgenome_1.34.1 rtracklayer_1.26.2 [5] GenomicAlignments_1.2.2 Rsamtools_1.18.3 [7] Biostrings_2.34.1 XVector_0.6.0 [9] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 [11] IRanges_2.0.1 S4Vectors_0.4.0 [13] Biobase_2.26.0 BiocGenerics_0.12.1 [15] Matrix_1.1-5 loaded via a namespace (and not attached): [1] base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.9 BiocParallel_1.0.3 [5] bitops_1.0-6 brew_1.0-6 checkmate_1.5.1 chron_2.3-45 [9] codetools_0.2-10 DBI_0.3.1 digest_0.6.8 fail_1.2 [13] foreach_1.4.2 grid_3.1.2 iterators_1.0.7 lattice_0.20-30 [17] plyr_1.8.1 Rcpp_0.11.4 RCurl_1.95-4.5 reshape2_1.4.1 [21] RSQLite_1.0.0 sendmailR_1.2-1 stringr_0.6.2 tools_3.1.2 [25] XML_3.98-1.1 zlibbioc_1.12.0 >