Find hits of subject that start somewhere in the query by using GRanges / IRanges and findOverlap()
1
0
Entering edit mode
svenbioinf • 0
@svenbioinf-11239
Last seen 2.3 years ago
Münster

Using findOverlaps() on a GRanges object I would like to retrive the following hits:

 

Sub:  -------|||||||||||||------------------

Query Ranges:

Hit5: ---------||||||||||||||||||------------

Hit6: -----------|||||||||||||||||||---------

Hit4: ---------|||||||---------------------

.

.

.

That means hits that start in a subject range and may or may not extend over it.

Minimal example:

> sub <- GRanges(c(1),strand=Rle(c("+"),c(1)), IRanges(c(5), c(7)),mcols=data.frame(id=c("T1")))

> sub
GRanges object with 1 range and 1 metadata column:
      seqnames    ranges strand | mcols.id
         <Rle> <IRanges>  <Rle> | <factor>
  [1]        1    [5, 7]      + |       T1
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> query <- GRanges(c(1,1,1,1,1,1),strand=Rle(c("+","+","+","+","+","+"),c(1,1,1,1,1,1)), IRanges(c(4,4,6,6,7,7), c(5,5,6,6,8,8)),mcols=data.frame(id=c("T8","T9","T10","T11","T12","T13")))
> query
GRanges object with 6 ranges and 1 metadata column:
      seqnames    ranges strand | mcols.id
         <Rle> <IRanges>  <Rle> | <factor>
  [1]        1    [4, 5]      + |       T8
  [2]        1    [4, 5]      + |       T9
  [3]        1    [6, 6]      + |      T10
  [4]        1    [6, 6]      + |      T11
  [5]        1    [7, 8]      + |      T12
  [6]        1    [7, 8]      + |      T13
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

 

I tried with countOverlaps type= "start" but that only gives me hits that start at the exact same position.

> sum(countOverlaps(query,sub))
[1] 6
> sum(countOverlaps(query,sub,type="start"))
[1] 0

 

There must be a way, thanks for looking into that!

IRanges GRanges findoverlaps R • 1.2k views
ADD COMMENT
3
Entering edit mode
@michael-lawrence-3846
Last seen 2.6 years ago
United States
findOverlaps(start(query), subject)

 

ADD COMMENT
0
Entering edit mode

Hi Michael! Oh, I understand what you are doing here!
However:

findOverlaps(start(query), subject)

Here, subject has to be a IRanges object that doesnt account for strand information.  So by ranges(sub) I get the IRanges and now I have to take care of the strand information myself.

 

This is a nice solution, thank you very much Michael!

ADD REPLY
0
Entering edit mode

Sorry, here is a better way for GRanges:

findOverlaps(resize(query, 1L), subject)
ADD REPLY

Login before adding your answer.

Traffic: 1071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6