reduce function in GenomicRanges
1
0
Entering edit mode
Frocha ▴ 20
@frocha-12039
Last seen 6.7 years ago

How to combine genomic intervals with gap widths (e.g., <100) on a circular reference sequence? The reduce() does not work well for intervals nearby the base with coordinate 1.

genomicranges reduce • 4.0k views
ADD COMMENT
0
Entering edit mode

Do you have an example?

ADD REPLY
0
Entering edit mode

Here is an example:

> seqinfo <- Seqinfo("chr1",
+ seqlengths=100,
+ isCircular=TRUE)
>
> gr=GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
+ strand="*",seqinfo=seqinfo)
>
> reduce(gr,min.gapwidth=10)
GRanges object with 2 ranges and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chr1 [ 5,  10]      *
  [2]     chr1 [95, 101]      *
  -------
  seqinfo: 1 sequence (1 circular) from an unspecified genome

Ideally, after reduce() the gr should contain one interval: [95, 110]

 

ADD REPLY
1
Entering edit mode
@herve-pages-1542
Last seen 11 days ago
Seattle, WA, United States

[Edited on Jul 23, 2017: Simplified and fixed bug in circularReduceIRanges(). Also added the min.gapwidth argument.]

Hi,

Indeed, reduce() does not handle circularity yet.

The circularReduceGRanges() function below implements a circularity-aware reduction for GRanges objects:

seqinfo <- Seqinfo("chr1", seqlengths=100, isCircular=TRUE)
gr <- GRanges("chr1", IRanges(c(5, 95), c(10, 101)),
              seqinfo=seqinfo)
circularReduceGRanges(gr, min.gapwidth=10)
# GRanges object with 1 range and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#   [1]     chr1 [95, 110]      * |           2,1
#   -------
#  seqinfo: 1 sequence (1 circular) from an unspecified genome

With a more complicated object:

seqinfo <- Seqinfo(seqnames=c("chr1", "chr2"),
                   seqlengths=c(100, NA),
                   isCircular=c(TRUE, FALSE))
gr2 <- GRanges(Rle(c("chr2", "chr1"), c(3, 7)),
               IRanges(c(105,20, 44, 8, 95,103,25,55, 2,21),
                       c(222,35,104,15,101,105,30,60,10,28)),
               seqinfo=seqinfo)
circularReduceGRanges(gr2)
# GRanges object with 5 ranges and 1 metadata column:
#       seqnames    ranges strand |        revmap
#          <Rle> <IRanges>  <Rle> | <IntegerList>
#   [1]     chr1 [21,  30]      * |          10,7
#   [2]     chr1 [55,  60]      * |             8
#   [3]     chr1 [95, 115]      * |     5,9,6,...
#   [4]     chr2 [20,  35]      * |             2
#   [5]     chr2 [44, 222]      * |           3,1
#   -------
#   seqinfo: 2 sequences (1 circular) from an unspecified genome

Lightly tested only...

Cheers,
H.

circularReduceIRanges <- function(x, min.gapwidth=1L, circle.length=NA)
{
    stopifnot(is(x, "IRanges"), isSingleNumberOrNA(circle.length))
    if (is.na(circle.length))
        return(reduce(x, min.gapwidth=min.gapwidth, with.revmap=TRUE))
    circle.length <- as.integer(circle.length)
    stopifnot(circle.length >= 1L)

    x <- IRanges:::.shift_ranges_to_first_circle(x, circle.length)
    ans <- reduce(c(x, shift(x, circle.length)),
                  min.gapwidth=min.gapwidth, with.revmap=TRUE)
    ans <- ans[start(ans) <= circle.length]
    mcols(ans)$revmap <- (mcols(ans)$revmap - 1L) %% length(x) + 1L
    ans_len <- length(ans)
    # Jul 23, 2017: Found a bug in the original function and fixed it by
    # replacing its last lines with the following lines.
    if (ans_len <= 1L) 
        return(ans)
    last_end <- end(ans)[[ans_len]]
    keep_idx <- which(start(ans) > last_end - circle.length)
    if (length(keep_idx) == 0L)
        keep_idx <- ans_len
    ans[keep_idx]
}

circularReduceGRanges <- function(x, min.gapwidth=1L)
{
    stopifnot(is(x, "GRanges"))
    rgl <- GenomicRanges:::deconstructGRintoRGL(x, with.revmap=TRUE)
    circle_lens <- seqlengths(x)
    circle_lens[!(isCircular(x) %in% TRUE)] <- NA
    circle_lens <- rep(circle_lens, each=3L)
    rgl2 <- IRangesList(mapply(circularReduceIRanges, rgl, circle_lens,
                               MoreArgs=list(min.gapwidth=min.gapwidth)))
    rgl2 <- GenomicRanges:::.fix_inner_revmap_mcol(rgl2, rgl)
    GenomicRanges:::reconstructGRfromRGL(rgl2, x)
}
ADD COMMENT
0
Entering edit mode

Thank you very much!

However, I wish to have a function which allows min.gapwidth option. Is there a way to do so?

ADD REPLY
0
Entering edit mode

Oh I see now why you expected ranges chr1:5-10 and chr1:95-101 to be merged. Sorry for missing that. I modified my answer above to support the min.gapwidth argument. Also fixed a bug in circularReduceIRanges().

Cheers,

H.

ADD REPLY
0
Entering edit mode

Thank you very much!

ADD REPLY

Login before adding your answer.

Traffic: 859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6