mmmh... fast but incorrect :-/
> x <- GRanges("chr1", IRanges(1:3, width=1))
> y <- GRanges("chr1", IRanges(c(10,20,10), width=1))
> reduceTwoGRanges(x, y)
[[1]]
GRanges with 2 ranges and 0 metadata columns:
seqnames ranges strand
<rle> <iranges> <rle>
[1] chr1 [1, 3] *
[2] chr1 [2, 2] *
---
seqlengths:
chr1
NA
[[2]]
GRanges with 2 ranges and 0 metadata columns:
seqnames ranges strand
<rle> <iranges> <rle>
[1] chr1 [10, 10] *
[2] chr1 [20, 20] *
---
seqlengths:
chr1
NA
Michael's approach based on detection of the connected components
of the graph of overlapping pairs is the way to go. Note that if
you want to stay close to the semantic of reduce(), you need to
call findOverlaps() with 'maxgap=1' though, so adjacent ranges are
considered to overlap. (The fact that we have to use 'maxgap=1'
for this is confusing because when ranges are adjacent the gap
between them is 0 not 1, but that's another story.)
Cheers,
H.
On 04/08/2014 12:55 PM, Niu, Liang (NIH/NIEHS) [E] wrote:
> Thanks!
> ________________________________________
> From: Hervé Pagès [hpages at fhcrc.org]
> Sent: Tuesday, April 08, 2014 3:42 PM
> To: Michael Lawrence; Niu, Liang (NIH/NIEHS) [E]
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] How to reduce two GRanges simutaneously
>
> Hi Liang, Michael,
>
> Yeah, an interesting problem indeed.
>
> Here is my take on it:
>
> # Reduce 'x' and 'y' simultaneously. Returns a list of 2
parallel
> # GRanges, the 1st one is reduced 'x' and the 2nd one reduced
'y'.
> reduceTwoGRanges <- function(x, y)
> {
> if (length(x) != length(y))
> stop("'x' and 'y' must have the same length")
> rx_revmap <- mcols(reduce(x, with.revmap=TRUE))$revmap
> ry_revmap <- mcols(reduce(y, with.revmap=TRUE))$revmap
> rx_map <- integer(length(x))
> rx_map[unlist(rx_revmap, use.names=FALSE)] <-
togroup(rx_revmap)
> ry_map <- integer(length(y))
> ry_map[unlist(ry_revmap, use.names=FALSE)] <-
togroup(ry_revmap)
> sm <- IRanges:::selfmatchIntegerPairs(rx_map, ry_map)
> ans1 <- unlist(range(split(x, sm)), use.names=FALSE)
> ans2 <- unlist(range(split(y, sm)), use.names=FALSE)
> list(ans1, ans2)
> }
>
> Cheers,
> H.
>
> On 04/08/2014 05:37 AM, Michael Lawrence wrote:
>> This has got to be one of the most complex problems posed here.
>>
>> Here is one approach, but it may be completely wrong:
>>
>> hits_x <- findOverlaps(x)
>> hits_y <- findOverlaps(y)
>> hits_both <- intersect(hits_x, hits_y)
>>
>> I think the constraint of mutual overlap in x and y is now
satisfied. In
>> order to reduce, we have to think of the Hits object as a graph,
where
>> the connected components are those ranges that need to be reduced.
>>
>> g <- graph::graphNEL(as.character(seq_len(length(x))),
>> split(subjectHits(hits_both), queryHits(hits_both)))
>> comp <- unname(RBGL::connectedComp(g)) # unname just for efficiency
>>
>> That graphNEL constructor might kill your performance? There are
other
>> ways to build the graph. Now we can form GRangesLists from the
connected
>> components, and since we know they all overlap, just get the range,
and
>> unlist them back to GRanges:
>>
>> ids <- as.integer(unlist(comp))
>> reduced_x <- unlist(range(relist(x[ids], comp)))
>> reduced_y <- unlist(range(relist(y[ids], comp)))
>>
>> And combine the result into a DataFrame with the range counts:
>>
>> answer <- DataFrame(x=reduced_x, y=reduced_y,
count=elementLengths(comp))
>>
>> Hopefully that gets you closer. This hasn't been tested at all.
>>
>> Michael
>>
>>
>>
>> On Mon, Apr 7, 2014 at 11:40 AM, Niu, Liang (NIH/NIEHS) [E]
>> <liang.niu at="" nih.gov="" <mailto:liang.niu="" at="" nih.gov="">> wrote:
>>
>> Dear Herve,
>>
>> I have a question: suppose that I have two GRanges of the same
>> length, say x and y, where x[i] and y[i] are two genomic
ranges that
>> are paired. What I want to do is to reduce x and y
simultaneously,
>> i.e., combine (x[i],y[i]) and (x[j],y[j]) when x[i] overlaps
with
>> x[j] and y[i] overlaps with y[j]. Ideally, I would like an
output in
>> the following format:
>>
>> range_x range_y count
>> chr?:???-??? chr?:???-??? ?
>>
>> where the range_x and range_y are the reduced ranges, and
count is
>> the number of records associated with the chr?:???-??? and
>> chr?:???-???. How can I do this?
>>
>> Thanks!
>>
>> Liang
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org="">
>>
https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>
http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M1-B514
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: (206) 667-5791
> Fax: (206) 667-1319
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319