Question

GenomicRanges Use Cases - subsetByOverlaps

0

Entering edit mode

James Perkins ▴ 120

@james-perkins-4948

Last seen 10.3 years ago

Hi, I am having some problems following the example in the vignette for GenomicRanges, specifically: 3.4 Identifying reads that do NOT overlap known annotation ... > filtData <- subsetByOverlaps(aligns, exonRanges) > length(filtData) [1] 17311 At this point, the filtData object only contains ranges that did not overlap with any of the known exons from Saccharomycess cerevisiae. My understanding of subsetByOverlaps is that it would bring back exactly the ranges that DO overlap with the known exons? 'subsetByOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = c("any", "start", "end", "within", "equal"))': Returns the subset of 'query' that has an overlap hit with a range in 'subject' using the specified 'findOverlaps' parameters. Both 'query' and 'subject' should be 'Ranges', 'RangesList' or 'RangedData' objects. I don't see how this gets the reads mapping in non-exon ranges. Surely it gets the reads mapping in the exon ranges? since exonRanges is obtained using: exonRanges <- exonsBy(txdb, "tx") Shouldn't I be looking for the subset that *doesn't* overlap? Something like subsetByOverlaps(! aligns, exonRanges)? Or have I missed something obvious (quite likely!)? Many thanks, Jim -- James Perkins, PhD student Institute of Structural and Molecular Biology Division of Biosciences University College London Gower Steet London, WC1E 6BT UK email: j.perkins at ucl.ac.uk phone: 0207 679 2198

• 1.2k views

ADD COMMENT • link updated 13.1 years ago by Steve Lianoglou ★ 13k • written 13.1 years ago by James Perkins ▴ 120

score 0 · Answer 1 · 2011-11-08

Hi, On Tue, Nov 8, 2011 at 4:27 AM, James Perkins <j.perkins at="" ucl.ac.uk=""> wrote: > Hi, > > I am having some problems following the example in the vignette for > GenomicRanges, specifically: > > 3.4 Identifying reads that do NOT overlap known annotation > ... >> filtData <- subsetByOverlaps(aligns, exonRanges) >> length(filtData) > [1] 17311 > At this point, the filtData object only contains ranges that did not > overlap with any of the known exons from Saccharomycess cerevisiae. > > My understanding of subsetByOverlaps is that it would bring back > exactly the ranges that DO overlap with the known exons? > > 'subsetByOverlaps(query, subject, maxgap = 0L, minoverlap = 1L, type = > ? ? ? ? ?c("any", "start", "end", "within", "equal"))': Returns the > ? ? ? ? ?subset of 'query' that has an overlap hit with a range in > ? ? ? ? ?'subject' using the specified 'findOverlaps' parameters. > ? ? ? ? ?Both 'query' and 'subject' should be 'Ranges', 'RangesList' > ? ? ? ? ?or 'RangedData' objects. > > I don't see how this gets the reads mapping in non-exon ranges. Surely > it gets the reads mapping in the exon ranges? since exonRanges is > obtained using: > > exonRanges <- exonsBy(txdb, "tx") > > Shouldn't I be looking for the subset that *doesn't* overlap? > Something like subsetByOverlaps(! aligns, exonRanges)? Or have I > missed something obvious (quite likely!)? One thing you can do is call `gaps` on your exonRanges to get the regions where reads hit the "gaps" between exons: R> not.exons <- subsetByOverlaps(aligns, gaps(exonRanges)) This will still return reads that partially overlap both exonic and not exonic regions. You can also do `! ... %in% ...`: R> not.exons <- aligns[!aligns %in% exonRanges] This will (should) only return reads that don't overlap with any `exonRanges` at all. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact