IRanges problem: findOverlaps
3
0
Entering edit mode
@nicolas-descostes-5388
Last seen 10.2 years ago
Dear members, I have a set of chip-Seq peaks which I want to find the overlap with a bench of annotations downloaded from ucsc. When I am doing the overlap between my peaks and annotations list, I am finding 5635 positive matches. When adding "type = "start"", I have no results returned. However, when I am visualizing the 5635 intervals, it overlaps many start site of my genes annotations. I tried to update IRanges but I am still getting no results. Any idea? Thanks. [[alternative HTML version deleted]]
IRanges IRanges • 1.9k views
ADD COMMENT
0
Entering edit mode
@jonathan-cairns-4111
Last seen 10.2 years ago
Hi, >From ?findOverlaps: "If ?type? is ?start? or ?end?, the intervals are required to have matching starts or ends, respectively." This doesn't sound like it's what you're after, as type = "start" considers only the starts of peaks, as well as the starts of whatever annotation you're using. Have you looked at the package ChIPpeakAnno? It is designed for annotating ChIP-seq peaks and might be useful. Jonathan ________________________________________ From: bioconductor-bounces@r-project.org [bioconductor- bounces@r-project.org] On Behalf Of Nicolas DESCOSTES [descostes@ciml .univ-mrs.fr] Sent: 10 July 2012 16:40 To: bioconductor at r-project.org Subject: [BioC] IRanges problem: findOverlaps Dear members, I have a set of chip-Seq peaks which I want to find the overlap with a bench of annotations downloaded from ucsc. When I am doing the overlap between my peaks and annotations list, I am finding 5635 positive matches. When adding "type = "start"", I have no results returned. However, when I am visualizing the 5635 intervals, it overlaps many start site of my genes annotations. I tried to update IRanges but I am still getting no results. Any idea? Thanks. [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor NOTICE AND DISCLAIMER This e-mail (including any attachments) is intended for ...{{dropped:16}}
ADD COMMENT
0
Entering edit mode
how about resize(genes,1, fix='start') findOverlaps(peaks, genes) however, +1 for Julie Zhu's ChIPpeakAnno package since, as was just pointed out, it is meant for such things! On Tue, Jul 10, 2012 at 9:43 AM, Jonathan Cairns <jonathan.cairns at="" cancer.org.uk=""> wrote: > Hi, > > >From ?findOverlaps: > > "If ?type? is ?start? or ?end?, the intervals are required to have matching starts or ends, respectively." > > This doesn't sound like it's what you're after, as type = "start" considers only the starts of peaks, as well as the starts of whatever annotation you're using. > > Have you looked at the package ChIPpeakAnno? It is designed for annotating ChIP-seq peaks and might be useful. > > Jonathan > > ________________________________________ > From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] On Behalf Of Nicolas DESCOSTES [descostes at ciml.univ- mrs.fr] > Sent: 10 July 2012 16:40 > To: bioconductor at r-project.org > Subject: [BioC] IRanges problem: findOverlaps > > Dear members, > > I have a set of chip-Seq peaks which I want to find the overlap with a bench of annotations downloaded from ucsc. > > When I am doing the overlap between my peaks and annotations list, I am finding 5635 positive matches. When adding "type = "start"", I have no results returned. However, when I am visualizing the 5635 intervals, it overlaps many start site of my genes annotations. > > I tried to update IRanges but I am still getting no results. > > Any idea? > > Thanks. > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for ...{{dropped:16}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper
ADD REPLY
0
Entering edit mode
Thanks all for your reply. Just tried ChIPpeakAnno, it is doing well. Cheers. -----Original Message----- From: Tim Triche, Jr. [mailto:tim.triche@gmail.com] Sent: Tuesday, July 10, 2012 6:47 PM To: Jonathan Cairns Cc: Nicolas DESCOSTES; bioconductor at r-project.org Subject: Re: [BioC] IRanges problem: findOverlaps how about resize(genes,1, fix='start') findOverlaps(peaks, genes) however, +1 for Julie Zhu's ChIPpeakAnno package since, as was just pointed out, it is meant for such things! On Tue, Jul 10, 2012 at 9:43 AM, Jonathan Cairns <jonathan.cairns at="" cancer.org.uk=""> wrote: > Hi, > > >From ?findOverlaps: > > "If 'type' is 'start' or 'end', the intervals are required to have matching starts or ends, respectively." > > This doesn't sound like it's what you're after, as type = "start" considers only the starts of peaks, as well as the starts of whatever annotation you're using. > > Have you looked at the package ChIPpeakAnno? It is designed for annotating ChIP-seq peaks and might be useful. > > Jonathan > > ________________________________________ > From: bioconductor-bounces at r-project.org > [bioconductor-bounces at r-project.org] On Behalf Of Nicolas DESCOSTES > [descostes at ciml.univ-mrs.fr] > Sent: 10 July 2012 16:40 > To: bioconductor at r-project.org > Subject: [BioC] IRanges problem: findOverlaps > > Dear members, > > I have a set of chip-Seq peaks which I want to find the overlap with a bench of annotations downloaded from ucsc. > > When I am doing the overlap between my peaks and annotations list, I am finding 5635 positive matches. When adding "type = "start"", I have no results returned. However, when I am visualizing the 5635 intervals, it overlaps many start site of my genes annotations. > > I tried to update IRanges but I am still getting no results. > > Any idea? > > Thanks. > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > NOTICE AND DISCLAIMER > This e-mail (including any attachments) is intended for > ...{{dropped:16}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- A model is a lie that helps you see the truth. Howard Skipper
ADD REPLY
0
Entering edit mode
@alessandro-brozzi-5276
Last seen 9.9 years ago
European Union
hi Nicolas, I don't know very well Iranges but to find overlapping intervals the package "intervals" http://cran.fhcrc.org/web/packages/intervals/index.html is very straightforward. Here an example: peaks = matrix( c( 2, 8, 8, 9, 6, 9, 11, 12, 3, 3 ), ncol = 2, byrow = TRUE ) > peaks [,1] [,2] [1,] 2 8 [2,] 8 9 [3,] 6 9 [4,] 11 12 [5,] 3 3 > track = matrix( c( 2, 8, 3, 4, 5, 10 ), ncol = 2, byrow = TRUE ) track [,1] [,2] [1,] 2 8 [2,] 3 4 [3,] 5 10 interval_overlap ( Intervals(peaks), Intervals(track)) [[1]] [1] 1 2 3 [[2]] [1] 1 3 [[3]] [1] 1 3 [[4]] integer(0) [[5]] [1] 1 2 the result is a list: for each peak you have the indexes of the corresponding overlapping items of the track matrix. Setting some options you can fine tune your research. HTH, Alex On Tue, Jul 10, 2012 at 5:40 PM, Nicolas DESCOSTES < descostes@ciml.univ-mrs.fr> wrote: > Dear members, > > I have a set of chip-Seq peaks which I want to find the overlap with a > bench of annotations downloaded from ucsc. > > When I am doing the overlap between my peaks and annotations list, I am > finding 5635 positive matches. When adding "type = "start"", I have no > results returned. However, when I am visualizing the 5635 intervals, it > overlaps many start site of my genes annotations. > > I tried to update IRanges but I am still getting no results. > > Any idea? > > Thanks. > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Alex, I am not sure interval_overlap() is better option than findOverlaps(). Is interval_overlap() aware that intervals may lie in different chromosomes? In other words, every peak has three coordinates: chromosome, start and end. The findOverlaps() function is aware of that. Thank you, Ivan Ivan Gregoretti, PhD On Tue, Jul 10, 2012 at 12:05 PM, alessandro brozzi <alessandro.brozzi at="" gmail.com=""> wrote: > hi Nicolas, > > I don't know very well Iranges but to find overlapping intervals the > package "intervals" > > http://cran.fhcrc.org/web/packages/intervals/index.html > > is very straightforward. > > Here an example: > > peaks = matrix( > c( > 2, 8, > 8, 9, > 6, 9, > 11, 12, > 3, 3 > ), > ncol = 2, byrow = TRUE > ) > >> peaks > [,1] [,2] > [1,] 2 8 > [2,] 8 9 > [3,] 6 9 > [4,] 11 12 > [5,] 3 3 >> > > track = matrix( > c( > 2, 8, > 3, 4, > 5, 10 > ), > ncol = 2, byrow = TRUE > ) > > track > [,1] [,2] > [1,] 2 8 > [2,] 3 4 > [3,] 5 10 > > interval_overlap ( Intervals(peaks), Intervals(track)) > > [[1]] > [1] 1 2 3 > > [[2]] > [1] 1 3 > > [[3]] > [1] 1 3 > > [[4]] > integer(0) > > [[5]] > [1] 1 2 > > the result is a list: for each peak you have the indexes of the > corresponding overlapping items of the track matrix. Setting some options > you can fine tune your research. > > HTH, > Alex > > > On Tue, Jul 10, 2012 at 5:40 PM, Nicolas DESCOSTES < > descostes at ciml.univ-mrs.fr> wrote: > >> Dear members, >> >> I have a set of chip-Seq peaks which I want to find the overlap with a >> bench of annotations downloaded from ucsc. >> >> When I am doing the overlap between my peaks and annotations list, I am >> finding 5635 positive matches. When adding "type = "start"", I have no >> results returned. However, when I am visualizing the 5635 intervals, it >> overlaps many start site of my genes annotations. >> >> I tried to update IRanges but I am still getting no results. >> >> Any idea? >> >> Thanks. >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Hi all, I'm not sure about intervals, but genomeIntervals is chromosome-aware (http://bioconductor.org/packages/release/bioc/html/genomeIntervals.ht ml). The syntax is as described by Alex, since genomeIntervals extends intervals. Whether you use genomeIntervals or IRanges is just a matter of taste, really. Cheer, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On Jul 10, 2012, at 6:34 PM, Ivan Gregoretti wrote: > Hi Alex, > > I am not sure interval_overlap() is better option than findOverlaps(). > > Is interval_overlap() aware that intervals may lie in different chromosomes? > > In other words, every peak has three coordinates: chromosome, start > and end. The findOverlaps() function is aware of that. > > Thank you, > > Ivan > > > Ivan Gregoretti, PhD > > > On Tue, Jul 10, 2012 at 12:05 PM, alessandro brozzi > <alessandro.brozzi at="" gmail.com=""> wrote: >> hi Nicolas, >> >> I don't know very well Iranges but to find overlapping intervals the >> package "intervals" >> >> http://cran.fhcrc.org/web/packages/intervals/index.html >> >> is very straightforward. >> >> Here an example: >> >> peaks = matrix( >> c( >> 2, 8, >> 8, 9, >> 6, 9, >> 11, 12, >> 3, 3 >> ), >> ncol = 2, byrow = TRUE >> ) >> >>> peaks >> [,1] [,2] >> [1,] 2 8 >> [2,] 8 9 >> [3,] 6 9 >> [4,] 11 12 >> [5,] 3 3 >>> >> >> track = matrix( >> c( >> 2, 8, >> 3, 4, >> 5, 10 >> ), >> ncol = 2, byrow = TRUE >> ) >> >> track >> [,1] [,2] >> [1,] 2 8 >> [2,] 3 4 >> [3,] 5 10 >> >> interval_overlap ( Intervals(peaks), Intervals(track)) >> >> [[1]] >> [1] 1 2 3 >> >> [[2]] >> [1] 1 3 >> >> [[3]] >> [1] 1 3 >> >> [[4]] >> integer(0) >> >> [[5]] >> [1] 1 2 >> >> the result is a list: for each peak you have the indexes of the >> corresponding overlapping items of the track matrix. Setting some options >> you can fine tune your research. >> >> HTH, >> Alex >> >> >> On Tue, Jul 10, 2012 at 5:40 PM, Nicolas DESCOSTES < >> descostes at ciml.univ-mrs.fr> wrote: >> >>> Dear members, >>> >>> I have a set of chip-Seq peaks which I want to find the overlap with a >>> bench of annotations downloaded from ucsc. >>> >>> When I am doing the overlap between my peaks and annotations list, I am >>> finding 5635 positive matches. When adding "type = "start"", I have no >>> results returned. However, when I am visualizing the 5635 intervals, it >>> overlaps many start site of my genes annotations. >>> >>> I tried to update IRanges but I am still getting no results. >>> >>> Any idea? >>> >>> Thanks. >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@julien-gagneur-2045
Last seen 10.2 years ago
Indeed, Nico. interval_overlap() can moreover be strand-specific (on Genome_intervals_stranded objects). It also deals with so-called "inter-base" positions, i.e positions between two nucleotides to represent, for example, insertion points or restriction enzyme cutting sites. One thus can ask whether cutting sites occurs within a set of exons without having to write extra code to deal with cutting sites right at exon boundaries. Best, Julien http://www.gagneur.genzentrum.lmu.de/
ADD COMMENT
0
Entering edit mode
Just for the record, findOverlaps on GenomicRanges objects is strand-specific. In theory, insertion points could be internally represented with ranges where end = start - 1. Then we could have a higher level class that makes that more user friendly. Haven't had a use case yet, though. Michael On Wed, Jul 11, 2012 at 3:50 AM, Julien Gagneur <julien.gagneur@embl.de>wrote: > Indeed, Nico. > > interval_overlap() can moreover be strand-specific (on > Genome_intervals_stranded objects). It also deals with so-called > "inter-base" positions, i.e positions between two nucleotides to represent, > for example, insertion points or restriction enzyme cutting sites. One thus > can ask whether cutting sites occurs within a set of exons without having > to write extra code to deal with cutting sites right at exon boundaries. > > Best, > > Julien > > http://www.gagneur.genzentrum.lmu.de/ > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6