I have a compressed IRangesList from which I would like to extract the first IRange for each element. I wrote the following example, which does exactly what I want but when I run it on my actual data I get the error 'Error: subscript contains out-of-bounds indices'
range1 <- IRanges(start=c(1, 2), end=c(5, 2))
range2 <- IRanges(start=c(15, 45, 20), end=c(15, 100, 80))
range3 <- IRanges(start=c(7), end=c(55))
range4 <- IRanges(start=c(7), end=c(55))
range5 <- IRanges(start=c(20, 63), end=c(40, 123))
named <- IRangesList(one = range1, two = range2, three = range3, four = range4, five = range5)
i <- rep(1, each = length(named))
i0 <- splitAsList(i)
named[i0]
IRangesList object of length 5:
$one
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 1 5 5
$two
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 15 15 1
$three
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 7 55 49
$four
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 7 55 49
$five
IRanges object with 1 range and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 20 40 21
I have figured out that the issue is that some of the elements of the list in my data do not contain any IRanges and when I incorporate this in my example I get the same error I see with my data.
range1 <- IRanges(start=c(1, 2), end=c(5, 2))
range2 <- IRanges(start=c(15, 45, 20), end=c(15, 100, 80))
range3 <- IRanges(start=c(7), end=c(55))
range4 <- IRanges()
range5 <- IRanges(start=c(20, 63), end=c(40, 123))
named <- IRangesList(one = range1, two = range2, three = range3, four = range4, five = range5)
i <- rep(1, each = length(named))
i0 <- splitAsList(i)
named[i0]
Error: subscript contains out-of-bounds indices
Is there a way to say that it's fine to skip over the elements with no IRanges? I do need them included as empty in the output because later on I extract the IRanges from the DNAStringSet that the IRanges were generated from.
sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8
[5] LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 LC_PAPER=C.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.70.3 GenomeInfoDb_1.38.8 XVector_0.42.0 IRanges_2.36.0
[5] S4Vectors_0.40.2 BiocGenerics_0.48.1
loaded via a namespace (and not attached):
[1] zlibbioc_1.48.2 compiler_4.3.3 tools_4.3.3
[4] GenomeInfoDbData_1.2.11 rstudioapi_0.16.0 RCurl_1.98-1.16
[7] crayon_1.5.3 bitops_1.0-8
That has worked thanks, I just had to tweak it a little to make it work on my dataset.
It takes about 30s to run on my 8000 read test dataset though, I wonder if there is a more efficient way of doing this but am all out of ideas. Thanks for the hint on tagging, I'm quite new to asking questions as I've always managed to find answers in existing posts before. I have edited my post and will consider my tags better in future.
How about this
Obviously requires a named
IRangesList