Extract coordinates of overlapping genomic intervals
1
1
Entering edit mode
rubi ▴ 110
@rubi-6462
Last seen 6.4 years ago

Hi,

I have two sets of Genomic Ranges which I'm intersecting using the findOverlaps of the GenomicRanges package:

df1 <- data.frame(chr=rep("chr1",6), start=c(10033259,10060726,98674166,10067579,10067607,11169988), end=c(10033289,10060783,98674223,10067654,10067664,11170044), strand=c("-","-","+","+","+","+"))

df2 <- data.frame(chr=rep("chr1",3),start=c(10024601,10033258,10033258),end=c(10038168,10033323,10033323),strand=c("-","-","-"))

df1.gr <- makeGRangesFromDataFrame(df1,seqnames.field="chr",start.field="start",end.field="end",strand.field="strand")

df2.gr <- makeGRangesFromDataFrame(df2,seqnames.field="chr",start.field="start",end.field="end",strand.field="strand") dfs.ol <- findOverlapsdf1.gr,df2.gr)

My question is how to extract the actual overlapping coordinates of each of the hits in the returned value of findOverlaps (dfs.ol)?

I know that the intersect function returns the collapsed intervals in the query genomic ranges which intersect with a search genomic ranges. But what I really need for each overlap between gr1 and gr2 are the coordinates of the overlap, in addition to the indices of the genomic ranges which overlap (in the returned Hits object).

genomicranges findoverlaps intersect • 1.8k views
ADD COMMENT
2
Entering edit mode
@jeff-johnston-6497
Last seen 7.0 years ago
United States

You can use pintersect:

overlaps.gr <- pintersect(df1.gr[queryHits(dfs.ol)], df2.gr[subjectHits(dfs.ol)])

If you want all the results in one object, you can add the indices as metadata columns:

overlaps.gr$df1_hit <- queryHits(dfs.ol)
overlaps.gr$df2_hit <- subjectHits(dfs.ol)

 

ADD COMMENT
0
Entering edit mode

Does that report the overlap interval though?

 

ADD REPLY
0
Entering edit mode

Yes, it generates the overlapping interval for each row (a query/subject pair) in your Hits object.

ADD REPLY
0
Entering edit mode

In the overlaps.gr object?

I only see the indices of the query and hit but not the overlap's coordinates. How do you extract that?

ADD REPLY
0
Entering edit mode

As the overlaps.gr is a GRanges object, you can use start(), end() and seqnames() to extract the coordinates of the overlapping intervals.

ADD REPLY

Login before adding your answer.

Traffic: 588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6