IRanges: Trying to cut overlapping intervals into pieces

0

Entering edit mode

Elizabeth Purdom ▴ 210

@elizabeth-purdom-2486

Last seen 3.0 years ago

USA/ Berkeley/UC Berkeley

Hi, I am trying to take overlapping intervals and return a set of intervals that are not overlapping but cover all of the region (and mantain the intervals that don't overlap). In particular, I don't want to merge intervals that overlap together (i.e. the reduce function in IRanges)-- I want to cut them up into distinct regions. For example, if I have intervals: [1,6], [4,8], [7,10] I want to get back the set of adjacent intervals: [1,3],[4,6],[7,8],[9,10] The options I find that look like they perhaps do this (intersect or setdiff?) seem to be related to the 'normal' ranges class; but this class requires a gap between intervals -- no adjacent intervals -- which is not what I want. Is there a nice way to do this with IRanges (or a not so nice one, but fast)? Similarly, is there a 'reduce' version that doesn't merge adjacent intervals but only truly overlapping ones? There are a lot of annotation examples where you wouldn't not want to merge adjacent intervals (e.g. UTRs) Thanks for any assistance! Elizabeth Purdom Division of Biostatistics UC, Berkeley

IRanges IRanges • 1.6k views

ADD COMMENT • link updated 15.8 years ago by Michael Lawrence ▴ 620 • written 15.8 years ago by Elizabeth Purdom ▴ 210

0

Entering edit mode

Michael Lawrence ▴ 620

@michael-lawrence-2759

Last seen 10.2 years ago

On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom < epurdom@stat.berkeley.edu> wrote: > Hi, > > I am trying to take overlapping intervals and return a set of intervals > that are not overlapping but cover all of the region (and mantain the > intervals that don't overlap). In particular, I don't want to merge > intervals that overlap together (i.e. the reduce function in IRanges)-- I > want to cut them up into distinct regions. For example, if I have intervals: > [1,6], [4,8], [7,10] > I want to get back the set of adjacent intervals: > [1,3],[4,6],[7,8],[9,10] Well that's a fun one. ir <- IRanges(c(1, 4, 7), c(6, 8, 10)) adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), sort(unique(c(end(ir), tail(start(ir),-1)-1)))) ... is a not so nice one, but pretty fast.. But if you had a gap in those ranges, like: ir <- IRanges(c(1, 4, 10), c(6, 8, 10)) So there's a gap at position 9, you would need an additional filtering step: adj[adj %in% ir] This last step requires the devel version of IRanges, but can be emulated using !is.na(overlap(ir, adj, multiple=FALSE)). > The options I find that look like they perhaps do this (intersect or > setdiff?) seem to be related to the 'normal' ranges class; but this class > requires a gap between intervals -- no adjacent intervals -- which is not > what I want. Is there a nice way to do this with IRanges (or a not so nice > one, but fast)? > The intersect and setdiff functions are for any Ranges, normal or not. They return normal IRanges though. Perhaps the documentation does not make this clear. They probably aren't very useful functions. > > Similarly, is there a 'reduce' version that doesn't merge adjacent > intervals but only truly overlapping ones? There are a lot of annotation > examples where you wouldn't not want to merge adjacent intervals (e.g. UTRs) > Try a trick like this: ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9)) width(ir2) <- width(ir2) - 1 rir2 <- reduce(ir2) width(rir2) <- width(rir2) + 1 Or find the overlap, reduce those that did overlap and combine that result with those that did not overlap. > Thanks for any assistance! Thanks for providing more use cases. We'll consider adding functionality along these lines to the base package (actually the reduce one has been on the TODO list for many months). > > Elizabeth Purdom > Division of Biostatistics > UC, Berkeley > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 15.8 years ago Michael Lawrence ▴ 620

0

Entering edit mode

Thanks for the help. It's much neater than what I had (and I do have gaps). If you are thinking about adding this kind of functionality, I'd be glad to tell you more specifically what I am doing if it helps to have a more general picture. I kept thinking the intersect or setdiff would help for different things, but it wound up not doing what I needed. For example, it doesn't do pairwise, but across the entire set, right? I often had a set of intervals that I wanted to 'subtract' pairwise from another set of intervals. Just to know, is there a function that does that? I thought maybe 'narrow', but sometimes subtracting an interval would cut an existing interval into pieces, and narrow doesn't seem to do this. Best, Elizabeth Michael Lawrence wrote: > > > On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom > <epurdom at="" stat.berkeley.edu="" <mailto:epurdom="" at="" stat.berkeley.edu="">> wrote: > > Hi, > > I am trying to take overlapping intervals and return a set of > intervals that are not overlapping but cover all of the region (and > mantain the intervals that don't overlap). In particular, I don't > want to merge intervals that overlap together (i.e. the reduce > function in IRanges)-- I want to cut them up into distinct regions. > For example, if I have intervals: > [1,6], [4,8], [7,10] > I want to get back the set of adjacent intervals: > [1,3],[4,6],[7,8],[9,10] > > > Well that's a fun one. > > ir <- IRanges(c(1, 4, 7), c(6, 8, 10)) > adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), > sort(unique(c(end(ir), tail(start(ir),-1)-1)))) > > ... is a not so nice one, but pretty fast.. > > But if you had a gap in those ranges, like: > > ir <- IRanges(c(1, 4, 10), c(6, 8, 10)) > > So there's a gap at position 9, you would need an additional filtering step: > > adj[adj %in% ir] > > This last step requires the devel version of IRanges, but can be > emulated using !is.na <http: is.na="">(overlap(ir, adj, multiple=FALSE)). > > > The options I find that look like they perhaps do this (intersect or > setdiff?) seem to be related to the 'normal' ranges class; but this > class requires a gap between intervals -- no adjacent intervals -- > which is not what I want. Is there a nice way to do this with > IRanges (or a not so nice one, but fast)? > > > The intersect and setdiff functions are for any Ranges, normal or not. > They return normal IRanges though. Perhaps the documentation does not > make this clear. They probably aren't very useful functions. > > > > Similarly, is there a 'reduce' version that doesn't merge adjacent > intervals but only truly overlapping ones? There are a lot of > annotation examples where you wouldn't not want to merge adjacent > intervals (e.g. UTRs) > > > Try a trick like this: > > ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9)) > width(ir2) <- width(ir2) - 1 > rir2 <- reduce(ir2) > width(rir2) <- width(rir2) + 1 > > Or find the overlap, reduce those that did overlap and combine that > result with those that did not overlap. > > > > Thanks for any assistance! > > > Thanks for providing more use cases. We'll consider adding functionality > along these lines to the base package (actually the reduce one has been > on the TODO list for many months). > > > > Elizabeth Purdom > Division of Biostatistics > UC, Berkeley > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 15.8 years ago Elizabeth Purdom ▴ 210

0

Entering edit mode

Hi Elizabeth, Elizabeth Purdom wrote: > Thanks for the help. It's much neater than what I had (and I do have > gaps). If you are thinking about adding this kind of functionality, I'd > be glad to tell you more specifically what I am doing if it helps to > have a more general picture. > > I kept thinking the intersect or setdiff would help for different > things, but it wound up not doing what I needed. For example, it doesn't > do pairwise, but across the entire set, right? I often had a set of > intervals that I wanted to 'subtract' pairwise from another set of > intervals. Just to know, is there a function that does that? The union, setdiff or substraction of 2 ranges can not always be represented by a single range so we need to be careful when performing those operations pairwise. For the pairwise intersection, no problem, the intersection of 2 ranges can always be represented by a single range (eventually empty): pairwiseIntersect <- function(ir1, ir2) { if (length(ir1) != length(ir2)) stop("'ir1' and 'ir2' must have the same length") ans_start <- pmax.int(start(ir1), start(ir2)) ans_end <- pmin.int(end(ir1), end(ir2)) ans_width <- ans_end - ans_start + 1L ans_width[ans_width < 0L] <- 0L IRanges(start=ans_start, width=ans_width) } > ir1 <- IRanges(c(1, 5, -2, 0, 14), c(10, 9, 3, 11, 17)) > ir2 <- IRanges(c(14, 0, -5, 6, 18), c(20, 2, 2, 8, 20)) > ir1 IRanges object: start end width 1 1 10 10 2 5 9 5 3 -2 3 6 4 0 11 12 5 14 17 4 > ir2 IRanges object: start end width 1 14 20 7 2 0 2 3 3 -5 2 8 4 6 8 3 5 18 20 3 > pairwiseIntersect(ir1, ir2) IRanges object: start end width 1 14 13 0 <-- note the empty ranges when the intersection is empty 2 5 4 0 3 -2 2 5 4 6 8 3 5 18 17 0 The union of 2 ranges will produce 1 range if there is no gap between the 2 ranges (i.e. they are overlapping or adjacent). So, in the general case, it's not possible to return the result of the pairwise union in an IRanges object of the same length as the input. One solution is to add the 'fill.gap' argument to let the user control whether s/he wants to enforce the union even when there is a gap between the 2 ranges (in that case, the union is obtained by filling this gap): pairwiseUnion <- function(ir1, ir2, fill.gap=FALSE) { if (length(ir1) != length(ir2)) stop("'ir1' and 'ir2' must have the same length") if (!fill.gap) { gap <- pmax.int(start(ir1), start(ir2)) - pmin.int(end(ir1), end(ir2)) - 1L if (any(gap > 0L)) stop("some pair of ranges have a gap within ", "the 2 members of the pair.\n", " Use 'fill.gap=TRUE' to enforce union ", "by filling the gap.") } ans_start <- pmin.int(start(ir1), start(ir2)) ans_end <- pmax.int(end(ir1), end(ir2)) IRanges(start=ans_start, end=ans_end) } > pairwiseUnion(ir1, ir2) Error in pairwiseUnion(ir1, ir2) : some pair of ranges have a gap within the 2 members of the pair. Use 'fill.gap=TRUE' to enforce union by filling the gap. > pairwiseUnion(ir1[3:5], ir2[3:5]) IRanges object: start end width 1 -5 3 9 2 0 11 12 3 14 20 7 > pairwiseUnion(ir1, ir2, fill.gap=TRUE) IRanges object: start end width 1 1 20 20 <- gap was filled 2 0 9 10 <- gap was filled 3 -5 3 9 4 0 11 12 5 14 20 7 Subtracting range two from range one will produce 1 range except when range two has its end points strictly inside range one. What can we do in such situation? The following function will return an error when this happens: pairwiseSubtract <- function(ir1, ir2) { if (length(ir1) != length(ir2)) stop("'ir1' and 'ir2' must have the same length") ans_start <- start(ir1) ans_end <- end(ir1) if (any((start(ir2) > ans_start) & (end(ir2) < ans_end))) stop("some ranges in 'ir2' have their end points strictly inside\n", " the range in 'ir1' that they need to be subtracted from") start2 <- pmax.int(ans_start, start(ir2)) end2 <- pmin.int(ans_end, end(ir2)) ii <- start2 <= end2 jj <- end2 == ans_end kk <- ii & jj ans_end[kk] <- start2[kk] - 1L kk <- ii & (!jj) ans_start[kk] <- end2[kk] + 1L IRanges(start=ans_start, end=ans_end) } > pairwiseSubtract(ir1, ir2) Error in pairwiseSubtract(ir1, ir2) : some ranges in 'ir2' have their end points strictly inside the range in 'ir1' that they need to be subtracted from > start(ir2)[4] <- -99 > end(ir2)[4] <- 99 > pairwiseSubtract(ir1, ir2) IRanges object: start end width 1 1 10 10 2 5 9 5 3 3 3 1 4 0 -1 0 <- nothing left in this range 5 14 17 4 I will add these 3 functions to the IRanges package and will let you know when this is done. Cheers, H. > I thought > maybe 'narrow', but sometimes subtracting an interval would cut an > existing interval into pieces, and narrow doesn't seem to do this. > > Best, > Elizabeth > > Michael Lawrence wrote: >> >> >> On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom >> <epurdom at="" stat.berkeley.edu="" <mailto:epurdom="" at="" stat.berkeley.edu="">> wrote: >> >> Hi, >> >> I am trying to take overlapping intervals and return a set of >> intervals that are not overlapping but cover all of the region (and >> mantain the intervals that don't overlap). In particular, I don't >> want to merge intervals that overlap together (i.e. the reduce >> function in IRanges)-- I want to cut them up into distinct regions. >> For example, if I have intervals: >> [1,6], [4,8], [7,10] >> I want to get back the set of adjacent intervals: >> [1,3],[4,6],[7,8],[9,10] >> >> >> Well that's a fun one. >> >> ir <- IRanges(c(1, 4, 7), c(6, 8, 10)) >> adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), >> sort(unique(c(end(ir), tail(start(ir),-1)-1)))) >> >> ... is a not so nice one, but pretty fast.. >> >> But if you had a gap in those ranges, like: >> >> ir <- IRanges(c(1, 4, 10), c(6, 8, 10)) >> >> So there's a gap at position 9, you would need an additional filtering >> step: >> >> adj[adj %in% ir] >> >> This last step requires the devel version of IRanges, but can be >> emulated using !is.na <http: is.na="">(overlap(ir, adj, multiple=FALSE)). >> >> >> The options I find that look like they perhaps do this (intersect or >> setdiff?) seem to be related to the 'normal' ranges class; but this >> class requires a gap between intervals -- no adjacent intervals -- >> which is not what I want. Is there a nice way to do this with >> IRanges (or a not so nice one, but fast)? >> >> >> The intersect and setdiff functions are for any Ranges, normal or not. >> They return normal IRanges though. Perhaps the documentation does not >> make this clear. They probably aren't very useful functions. >> >> >> >> Similarly, is there a 'reduce' version that doesn't merge adjacent >> intervals but only truly overlapping ones? There are a lot of >> annotation examples where you wouldn't not want to merge adjacent >> intervals (e.g. UTRs) >> >> >> Try a trick like this: >> >> ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9)) >> width(ir2) <- width(ir2) - 1 >> rir2 <- reduce(ir2) >> width(rir2) <- width(rir2) + 1 >> >> Or find the overlap, reduce those that did overlap and combine that >> result with those that did not overlap. >> >> >> >> Thanks for any assistance! >> >> >> Thanks for providing more use cases. We'll consider adding >> functionality along these lines to the base package (actually the >> reduce one has been on the TODO list for many months). >> >> >> >> Elizabeth Purdom >> Division of Biostatistics >> UC, Berkeley >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> <mailto:bioconductor at="" stat.math.ethz.ch=""> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 15.8 years ago Hervé Pagès 16k

0

Entering edit mode

Hi Elizabeth, I've added the punion(), pintersect() and psetdiff() functions (better naming suggested by Michael, the "p" stands for "parallel") to the IRanges package. Make sure you update your installation in about 24 hours and let me know if you have any question. Cheers, H. Hervé Pagès wrote: > Hi Elizabeth, > > Elizabeth Purdom wrote: >> Thanks for the help. It's much neater than what I had (and I do have >> gaps). If you are thinking about adding this kind of functionality, >> I'd be glad to tell you more specifically what I am doing if it helps >> to have a more general picture. >> >> I kept thinking the intersect or setdiff would help for different >> things, but it wound up not doing what I needed. For example, it >> doesn't do pairwise, but across the entire set, right? I often had a >> set of intervals that I wanted to 'subtract' pairwise from another set >> of intervals. Just to know, is there a function that does that? > > The union, setdiff or substraction of 2 ranges can not always be > represented > by a single range so we need to be careful when performing those operations > pairwise. > > For the pairwise intersection, no problem, the intersection of 2 ranges > can always be represented by a single range (eventually empty): > > pairwiseIntersect <- function(ir1, ir2) > { > if (length(ir1) != length(ir2)) > stop("'ir1' and 'ir2' must have the same length") > ans_start <- pmax.int(start(ir1), start(ir2)) > ans_end <- pmin.int(end(ir1), end(ir2)) > ans_width <- ans_end - ans_start + 1L > ans_width[ans_width < 0L] <- 0L > IRanges(start=ans_start, width=ans_width) > } > > > ir1 <- IRanges(c(1, 5, -2, 0, 14), c(10, 9, 3, 11, 17)) > > ir2 <- IRanges(c(14, 0, -5, 6, 18), c(20, 2, 2, 8, 20)) > > ir1 > IRanges object: > start end width > 1 1 10 10 > 2 5 9 5 > 3 -2 3 6 > 4 0 11 12 > 5 14 17 4 > > ir2 > IRanges object: > start end width > 1 14 20 7 > 2 0 2 3 > 3 -5 2 8 > 4 6 8 3 > 5 18 20 3 > > pairwiseIntersect(ir1, ir2) > IRanges object: > start end width > 1 14 13 0 <-- note the empty ranges when the intersection is > empty > 2 5 4 0 > 3 -2 2 5 > 4 6 8 3 > 5 18 17 0 > > The union of 2 ranges will produce 1 range if there is no gap between > the 2 ranges (i.e. they are overlapping or adjacent). So, in the general > case, it's not possible to return the result of the pairwise union in an > IRanges object of the same length as the input. > One solution is to add the 'fill.gap' argument to let the user control > whether s/he wants to enforce the union even when there is a gap > between the 2 ranges (in that case, the union is obtained by filling > this gap): > > pairwiseUnion <- function(ir1, ir2, fill.gap=FALSE) > { > if (length(ir1) != length(ir2)) > stop("'ir1' and 'ir2' must have the same length") > if (!fill.gap) { > gap <- pmax.int(start(ir1), start(ir2)) - > pmin.int(end(ir1), end(ir2)) - 1L > if (any(gap > 0L)) > stop("some pair of ranges have a gap within ", > "the 2 members of the pair.\n", > " Use 'fill.gap=TRUE' to enforce union ", > "by filling the gap.") > } > ans_start <- pmin.int(start(ir1), start(ir2)) > ans_end <- pmax.int(end(ir1), end(ir2)) > IRanges(start=ans_start, end=ans_end) > } > > > pairwiseUnion(ir1, ir2) > Error in pairwiseUnion(ir1, ir2) : > some pair of ranges have a gap within the 2 members of the pair. > Use 'fill.gap=TRUE' to enforce union by filling the gap. > > pairwiseUnion(ir1[3:5], ir2[3:5]) > IRanges object: > start end width > 1 -5 3 9 > 2 0 11 12 > 3 14 20 7 > > pairwiseUnion(ir1, ir2, fill.gap=TRUE) > IRanges object: > start end width > 1 1 20 20 <- gap was filled > 2 0 9 10 <- gap was filled > 3 -5 3 9 > 4 0 11 12 > 5 14 20 7 > > Subtracting range two from range one will produce 1 range except when > range two has its end points strictly inside range one. What can we do > in such situation? The following function will return an error when this > happens: > > pairwiseSubtract <- function(ir1, ir2) > { > if (length(ir1) != length(ir2)) > stop("'ir1' and 'ir2' must have the same length") > ans_start <- start(ir1) > ans_end <- end(ir1) > if (any((start(ir2) > ans_start) & (end(ir2) < ans_end))) > stop("some ranges in 'ir2' have their end points strictly > inside\n", > " the range in 'ir1' that they need to be subtracted from") > start2 <- pmax.int(ans_start, start(ir2)) > end2 <- pmin.int(ans_end, end(ir2)) > ii <- start2 <= end2 > jj <- end2 == ans_end > kk <- ii & jj > ans_end[kk] <- start2[kk] - 1L > kk <- ii & (!jj) > ans_start[kk] <- end2[kk] + 1L > IRanges(start=ans_start, end=ans_end) > } > > > pairwiseSubtract(ir1, ir2) > Error in pairwiseSubtract(ir1, ir2) : > some ranges in 'ir2' have their end points strictly inside > the range in 'ir1' that they need to be subtracted from > > start(ir2)[4] <- -99 > > end(ir2)[4] <- 99 > > pairwiseSubtract(ir1, ir2) > IRanges object: > start end width > 1 1 10 10 > 2 5 9 5 > 3 3 3 1 > 4 0 -1 0 <- nothing left in this range > 5 14 17 4 > > I will add these 3 functions to the IRanges package and will let you know > when this is done. > > Cheers, > > H. > >> I thought maybe 'narrow', but sometimes subtracting an interval would >> cut an existing interval into pieces, and narrow doesn't seem to do this. >> >> Best, >> Elizabeth >> >> Michael Lawrence wrote: >>> >>> >>> On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom >>> <epurdom at="" stat.berkeley.edu="" <mailto:epurdom="" at="" stat.berkeley.edu="">> wrote: >>> >>> Hi, >>> >>> I am trying to take overlapping intervals and return a set of >>> intervals that are not overlapping but cover all of the region (and >>> mantain the intervals that don't overlap). In particular, I don't >>> want to merge intervals that overlap together (i.e. the reduce >>> function in IRanges)-- I want to cut them up into distinct regions. >>> For example, if I have intervals: >>> [1,6], [4,8], [7,10] >>> I want to get back the set of adjacent intervals: >>> [1,3],[4,6],[7,8],[9,10] >>> >>> >>> Well that's a fun one. >>> >>> ir <- IRanges(c(1, 4, 7), c(6, 8, 10)) >>> adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), >>> sort(unique(c(end(ir), tail(start(ir),-1)-1)))) >>> >>> ... is a not so nice one, but pretty fast.. >>> >>> But if you had a gap in those ranges, like: >>> >>> ir <- IRanges(c(1, 4, 10), c(6, 8, 10)) >>> >>> So there's a gap at position 9, you would need an additional >>> filtering step: >>> >>> adj[adj %in% ir] >>> >>> This last step requires the devel version of IRanges, but can be >>> emulated using !is.na <http: is.na="">(overlap(ir, adj, multiple=FALSE)). >>> >>> >>> The options I find that look like they perhaps do this (intersect or >>> setdiff?) seem to be related to the 'normal' ranges class; but this >>> class requires a gap between intervals -- no adjacent intervals -- >>> which is not what I want. Is there a nice way to do this with >>> IRanges (or a not so nice one, but fast)? >>> >>> >>> The intersect and setdiff functions are for any Ranges, normal or >>> not. They return normal IRanges though. Perhaps the documentation >>> does not make this clear. They probably aren't very useful functions. >>> >>> >>> >>> Similarly, is there a 'reduce' version that doesn't merge adjacent >>> intervals but only truly overlapping ones? There are a lot of >>> annotation examples where you wouldn't not want to merge adjacent >>> intervals (e.g. UTRs) >>> >>> >>> Try a trick like this: >>> >>> ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9)) >>> width(ir2) <- width(ir2) - 1 >>> rir2 <- reduce(ir2) >>> width(rir2) <- width(rir2) + 1 >>> >>> Or find the overlap, reduce those that did overlap and combine that >>> result with those that did not overlap. >>> >>> >>> >>> Thanks for any assistance! >>> >>> >>> Thanks for providing more use cases. We'll consider adding >>> functionality along these lines to the base package (actually the >>> reduce one has been on the TODO list for many months). >>> >>> >>> >>> Elizabeth Purdom >>> Division of Biostatistics >>> UC, Berkeley >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at stat.math.ethz.ch >>> <mailto:bioconductor at="" stat.math.ethz.ch=""> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 15.8 years ago Hervé Pagès 16k

0

Entering edit mode

It appears your solution assumes that the intervals are unique, right? Elizabeth Michael Lawrence wrote: > > > On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom > <epurdom at="" stat.berkeley.edu="" <mailto:epurdom="" at="" stat.berkeley.edu="">> wrote: > > Hi, > > I am trying to take overlapping intervals and return a set of > intervals that are not overlapping but cover all of the region (and > mantain the intervals that don't overlap). In particular, I don't > want to merge intervals that overlap together (i.e. the reduce > function in IRanges)-- I want to cut them up into distinct regions. > For example, if I have intervals: > [1,6], [4,8], [7,10] > I want to get back the set of adjacent intervals: > [1,3],[4,6],[7,8],[9,10] > > > Well that's a fun one. > > ir <- IRanges(c(1, 4, 7), c(6, 8, 10)) > adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), > sort(unique(c(end(ir), tail(start(ir),-1)-1)))) > > ... is a not so nice one, but pretty fast.. > > But if you had a gap in those ranges, like: > > ir <- IRanges(c(1, 4, 10), c(6, 8, 10)) > > So there's a gap at position 9, you would need an additional filtering step: > > adj[adj %in% ir] > > This last step requires the devel version of IRanges, but can be > emulated using !is.na <http: is.na="">(overlap(ir, adj, multiple=FALSE)). > > > The options I find that look like they perhaps do this (intersect or > setdiff?) seem to be related to the 'normal' ranges class; but this > class requires a gap between intervals -- no adjacent intervals -- > which is not what I want. Is there a nice way to do this with > IRanges (or a not so nice one, but fast)? > > > The intersect and setdiff functions are for any Ranges, normal or not. > They return normal IRanges though. Perhaps the documentation does not > make this clear. They probably aren't very useful functions. > > > > Similarly, is there a 'reduce' version that doesn't merge adjacent > intervals but only truly overlapping ones? There are a lot of > annotation examples where you wouldn't not want to merge adjacent > intervals (e.g. UTRs) > > > Try a trick like this: > > ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9)) > width(ir2) <- width(ir2) - 1 > rir2 <- reduce(ir2) > width(rir2) <- width(rir2) + 1 > > Or find the overlap, reduce those that did overlap and combine that > result with those that did not overlap. > > > > Thanks for any assistance! > > > Thanks for providing more use cases. We'll consider adding functionality > along these lines to the base package (actually the reduce one has been > on the TODO list for many months). > > > > Elizabeth Purdom > Division of Biostatistics > UC, Berkeley > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 15.8 years ago Elizabeth Purdom ▴ 210

0

Entering edit mode

Hi Michael, I think for what I want to do, actually, you need some extra sorts and uniques in the call to head and tail to make these calls completely remove the first/last positions -- they might be replicated (this is true even if you have unique intervals). This wasn't the case in the example I sent you, but I hit the error when I tried to run it over my true data. For example: > ir <- IRanges(c(1,2,3,4),c(7,5,5,7)) > adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), sort(unique(c(end(ir), tail(start(ir),-1)-1)))) Error in IRanges(sort(unique(c(start(ir), head(end(ir), -1) + 1))), sort(unique(c(end(ir), : 'start' and 'end' must have the same length #instead: > newadj <- IRanges(sort(unique(c(start(ir), head(unique(sort(end(ir))),-1)+1))), sort(unique(c(end(ir), tail(unique(sort(start(ir))),-1)-1)))) Best, Elizabeth Michael Lawrence wrote: > > > On Fri, Jan 23, 2009 at 10:24 PM, Elizabeth Purdom > <epurdom at="" stat.berkeley.edu="" <mailto:epurdom="" at="" stat.berkeley.edu="">> wrote: > > Hi, > > I am trying to take overlapping intervals and return a set of > intervals that are not overlapping but cover all of the region (and > mantain the intervals that don't overlap). In particular, I don't > want to merge intervals that overlap together (i.e. the reduce > function in IRanges)-- I want to cut them up into distinct regions. > For example, if I have intervals: > [1,6], [4,8], [7,10] > I want to get back the set of adjacent intervals: > [1,3],[4,6],[7,8],[9,10] > > > Well that's a fun one. > > ir <- IRanges(c(1, 4, 7), c(6, 8, 10)) > adj <- IRanges(sort(unique(c(start(ir), head(end(ir),-1)+1))), > sort(unique(c(end(ir), tail(start(ir),-1)-1)))) > > ... is a not so nice one, but pretty fast.. > > But if you had a gap in those ranges, like: > > ir <- IRanges(c(1, 4, 10), c(6, 8, 10)) > > So there's a gap at position 9, you would need an additional filtering step: > > adj[adj %in% ir] > > This last step requires the devel version of IRanges, but can be > emulated using !is.na <http: is.na="">(overlap(ir, adj, multiple=FALSE)). > > > The options I find that look like they perhaps do this (intersect or > setdiff?) seem to be related to the 'normal' ranges class; but this > class requires a gap between intervals -- no adjacent intervals -- > which is not what I want. Is there a nice way to do this with > IRanges (or a not so nice one, but fast)? > > > The intersect and setdiff functions are for any Ranges, normal or not. > They return normal IRanges though. Perhaps the documentation does not > make this clear. They probably aren't very useful functions. > > > > Similarly, is there a 'reduce' version that doesn't merge adjacent > intervals but only truly overlapping ones? There are a lot of > annotation examples where you wouldn't not want to merge adjacent > intervals (e.g. UTRs) > > > Try a trick like this: > > ir2 <- IRanges(c(1, 5, 7), c(4, 6, 9)) > width(ir2) <- width(ir2) - 1 > rir2 <- reduce(ir2) > width(rir2) <- width(rir2) + 1 > > Or find the overlap, reduce those that did overlap and combine that > result with those that did not overlap. > > > > Thanks for any assistance! > > > Thanks for providing more use cases. We'll consider adding functionality > along these lines to the base package (actually the reduce one has been > on the TODO list for many months). > > > > Elizabeth Purdom > Division of Biostatistics > UC, Berkeley > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch <mailto:bioconductor at="" stat.math.ethz.ch=""> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > >

ADD REPLY • link 15.8 years ago Elizabeth Purdom ▴ 210

Login before adding your answer.