How to extract positions present in one genomic ranges object and not the other?
1
0
Entering edit mode
Grinch • 0
@grinch-8455
Last seen 18 months ago
Sweden

I have two Granges objects that overlap in 90% of positions. I am interested in the positions that do not overlap and are only present in one Granges object, but not the other. How to extract those positions?

genomicranges r overlap • 2.3k views
ADD COMMENT
2
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 24 days ago
Republic of Ireland

To do this, one way is via negative indices, as I elaborate in my answer on my Biostars: https://www.biostars.org/p/263214/#337057

So, a reproducible example:

1, create the GRanges objects

require(GenomicRanges)
gr1 <- makeGRangesFromDataFrame(
    data.frame(
        chr = rep('chr1',3),
        start = c(1,100,500),
        end = c(50,200,1000),
        gene = c('ENSG1','ENSG2','ENSG3')),
        keep.extra.columns = TRUE)

gr2 <- makeGRangesFromDataFrame(
    data.frame(
        chr=rep('chr1',3),
        start=c(100,500,1001),
        end=c(200,1000,1050),
        gene=c("ENSG2","ENSG3","ENSG4")),
        keep.extra.columns=TRUE)

gr1
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1      1-50      * |    ENSG1
  [2]      chr1   100-200      * |    ENSG2
  [3]      chr1  500-1000      * |    ENSG3

gr2
GRanges object with 3 ranges and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1   100-200      * |    ENSG2
  [2]      chr1  500-1000      * |    ENSG3
  [3]      chr1 1001-1050      * |    ENSG4

2a, find the overlapping segments for gr1 / gr2

idx_overlaps <- queryHits(findOverlaps(gr1, gr2, type = 'any'))

2b, subtract out the overlapping segments to leave non-overlapping

gr1[-idx_overlaps,] 
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]     chr1      1-50      * |    ENSG1

3, repeat the other way around (gr2 / gr1)

gr2[-queryHits(findOverlaps(gr2, gr1, type = 'any')),]
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand |     gene
         <Rle> <IRanges>  <Rle> | <factor>
  [1]      chr1 1001-1050      * |    ENSG4

Kevin

ADD COMMENT
3
Entering edit mode

Another option is subsetByOverlaps(gr1, gr2, invert=TRUE). Also, setdiff(gr1, gr2) will return the positions covered in one but not the other (position-based instead of element-based).

ADD REPLY

Login before adding your answer.

Traffic: 737 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6