Question

Is there a way to allow elements with no IRanges when subsetting an IRangesList

0

Entering edit mode

Christine Jones • 0

@254804fa

Last seen 5 months ago

United Kingdom

I have a compressed IRangesList from which I would like to extract the first IRange for each element. I wrote the following example, which does exactly what I want but when I run it on my actual data I get the error 'Error: subscript contains out-of-bounds indices'


range1 <- IRanges(start=c(1, 2), end=c(5, 2))
range2 <- IRanges(start=c(15, 45, 20), end=c(15, 100, 80))
range3 <- IRanges(start=c(7), end=c(55))
range4 <- IRanges(start=c(7), end=c(55))
range5 <- IRanges(start=c(20, 63), end=c(40, 123))
named <- IRangesList(one = range1, two = range2, three = range3, four = range4, five = range5)

i <- rep(1, each = length(named))
i0 <- splitAsList(i)
named[i0]

IRangesList object of length 5:
$one
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         1         5         5

$two
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]        15        15         1

$three
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         7        55        49

$four
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]         7        55        49

$five
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  [1]        20        40        21

I have figured out that the issue is that some of the elements of the list in my data do not contain any IRanges and when I incorporate this in my example I get the same error I see with my data.

range1 <- IRanges(start=c(1, 2), end=c(5, 2))
range2 <- IRanges(start=c(15, 45, 20), end=c(15, 100, 80))
range3 <- IRanges(start=c(7), end=c(55))
range4 <- IRanges()
range5 <- IRanges(start=c(20, 63), end=c(40, 123))
named <- IRangesList(one = range1, two = range2, three = range3, four = range4, five = range5)

i <- rep(1, each = length(named))
i0 <- splitAsList(i)
named[i0]

Error: subscript contains out-of-bounds indices

Is there a way to say that it's fine to skip over the elements with no IRanges? I do need them included as empty in the output because later on I extract the IRanges from the DNAStringSet that the IRanges were generated from.

sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so;  LAPACK version 3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Biostrings_2.70.3   GenomeInfoDb_1.38.8 XVector_0.42.0      IRanges_2.36.0     
[5] S4Vectors_0.40.2    BiocGenerics_0.48.1

loaded via a namespace (and not attached):
[1] zlibbioc_1.48.2         compiler_4.3.3          tools_4.3.3            
[4] GenomeInfoDbData_1.2.11 rstudioapi_0.16.0       RCurl_1.98-1.16        
[7] crayon_1.5.3            bitops_1.0-8

IRanges Biostrings S4Vectors • 815 views

ADD COMMENT • link written 5 months ago by Christine Jones • 0

score 1 · Answer 1 · 2024-10-25

1

Entering edit mode

Aidan ▴ 60

@3efa9cc7

Last seen 3 months ago

United States

I haven't thought about this for too long, so there might be a better way to do this....but my initial idea is just to set the valueof splitAsList to 0 when there's no elements in the corresponding IRanges object, like so:

range1 <- IRanges(start=c(1, 2), end=c(5, 2))
range2 <- IRanges(start=c(15, 45, 20), end=c(15, 100, 80))
range3 <- IRanges(start=c(7), end=c(55))
range4 <- IRanges()
range5 <- IRanges(start=c(20, 63), end=c(40, 123))
named <- IRangesList(one = range1, two = range2, three = range3, four = range4, five = range5)

i <- rep(1, each = length(named))
i0 <- splitAsList(i)

## make 0-length lists have 0 in i0 so nothing is selected
i0[lengths(named) == 0] <- 0
##

named[i0]

If i0[x] == 0, then it'll give you back an empty set from the subset operation of named[i0] without throwing errors.

If that doesn't work let me know, I can think about it / troubleshoot some more.

Edit: also for future reference, this is more to do with the IRanges and S4Vectors package than Biostrings. Tagging Biostrings will probably get you most of the same people, but you might have more visibility tagging IRanges as well.

ADD COMMENT • link 5 months ago Aidan ▴ 60

0

Entering edit mode

That has worked thanks, I just had to tweak it a little to make it work on my dataset.

i0[unname(lengths(named)) == 0] <- 0

It takes about 30s to run on my 8000 read test dataset though, I wonder if there is a more efficient way of doing this but am all out of ideas. Thanks for the hint on tagging, I'm quite new to asking questions as I've always managed to find answers in existing posts before. I have edited my post and will consider my tags better in future.

ADD REPLY • link 5 months ago Christine Jones • 0

0

Entering edit mode

How about this

## add in a zero-length IRanges object
> range6 <- IRanges()
> named <- IRangesList(one = range1, two = range2, three = range3, four = range4, five = range5, six = range6)
> z <- unlist(named)
> z <- z[!duplicated(names(z))]
> splitAsList(z, factor(names(z), names(z)))
IRangesList object of length 5:
$one
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  one         1         5         5

$two
IRanges object with 1 range and 0 metadata columns:
          start       end     width
      <integer> <integer> <integer>
  two        15        15         1

$three
IRanges object with 1 range and 0 metadata columns:
            start       end     width
        <integer> <integer> <integer>
  three         7        55        49

$four
IRanges object with 1 range and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
  four         7        55        49

$five
IRanges object with 1 range and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
  five        20        40        21

Obviously requires a named IRangesList

ADD REPLY • link 5 months ago James W. MacDonald 68k