Most efficient way to compute width of overlap of multiple features
1
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
So it seems like from twitter that you still haven't solved this. I'm pretty sure this is as simple as: a$overlap.with.b <- viewSums(Views(coverage(reduce(b)), a)) But maybe I'm not exactly understanding what you want. Michael On Mon, Jan 6, 2014 at 2:52 PM, Vince S. Buffalo <vsbuffalo@gmail.com>wrote: > Hi Bioconductor folks, > > I'm trying to create some GRanges summaries, but I think I may be missing > an obvious solution. I have fixed-width windows as a GRanges object, and > for each window/row I'd like to add a metadata column that indicates how > many base pairs of this window overlap features in another GRange object. > I'll need to add a few columns for different features in different GRanges > objects. > > I've tried using the approach of findOverlaps, followed by ranges() to > extract range widths. This creates an error: "'query' must be a Ranges of > length equal to number of queries". I saw in the source > that pintersect(query[queryHits(x)], subject[subjectHits(x)]) works too > (and does without error). This returns the overlapping ranges, but it'd > take a load of data munging to get it into the format I'd like -- it seems > like I may be overlooking an easier solution. > > thanks, > Vince > > !> sessionInfo() > R version 3.0.2 (2013-09-25) > Platform: x86_64-apple-darwin13.0.1 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] ggplot2_0.9.3.1 rtracklayer_1.22.0 GenomicRanges_1.14.3 > [4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 > [7] ESSR_1.0.1 > > loaded via a namespace (and not attached): > [1] Biostrings_2.30.1 bitops_1.0-6 BSgenome_1.30.0 > colorspace_1.2-4 > [5] compiler_3.0.2 dichromat_2.0-0 digest_0.6.4 grid_3.0.2 > [9] gtable_0.1.2 labeling_0.2 MASS_7.3-29 > munsell_0.4.2 > [13] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 > RCurl_1.95-4.1 > [17] reshape2_1.2.2 Rsamtools_1.14.2 scales_0.2.3 stats4_3.0.2 > [21] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 > zlibbioc_1.8.0 > > -- > Vince Buffalo > Ross-Ibarra Lab www.rilab.org) > Plant Sciences, UC Davis > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
• 1.1k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
Actually, just realized that 'a' would need to be a RangesList, so it is more like: ## assuming 'a' is in chromosome order (which it probably is) a$overlap.with.b <- unlist(viewSums(Views(coverage(reduce(b)), as(a, "RangesList")))) It would be nice if we could perform Views aggregations with GRanges. One possibility would just be view* shortcuts for GRanges, like: a$overlap.with.b <- viewSums(coverage(reduce(b)), a) That gets around the need to represent a Views with GRanges. Without Views, you would need to find overlaps and aggregate conditional on the query index factor, instead of range-based aggregation. This approach is less natural in my opinion: hits <- findOverlaps(a, reduce(b)) w <- width(ranges(hits, a, b)) a$overlap.with.b <- sum(relist(w, as(hits, "List"))) On Fri, Mar 28, 2014 at 8:24 PM, Michael Lawrence <michafla@gene.com> wrote: > So it seems like from twitter that you still haven't solved this. I'm > pretty sure this is as simple as: > > a$overlap.with.b <- viewSums(Views(coverage(reduce(b)), a)) > > But maybe I'm not exactly understanding what you want. > > Michael > > > > On Mon, Jan 6, 2014 at 2:52 PM, Vince S. Buffalo <vsbuffalo@gmail.com>wrote: > >> Hi Bioconductor folks, >> >> I'm trying to create some GRanges summaries, but I think I may be missing >> an obvious solution. I have fixed-width windows as a GRanges object, and >> for each window/row I'd like to add a metadata column that indicates how >> many base pairs of this window overlap features in another GRange object. >> I'll need to add a few columns for different features in different GRanges >> objects. >> >> I've tried using the approach of findOverlaps, followed by ranges() to >> extract range widths. This creates an error: "'query' must be a Ranges of >> length equal to number of queries". I saw in the source >> that pintersect(query[queryHits(x)], subject[subjectHits(x)]) works too >> (and does without error). This returns the overlapping ranges, but it'd >> take a load of data munging to get it into the format I'd like -- it seems >> like I may be overlooking an easier solution. >> >> thanks, >> Vince >> >> !> sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin13.0.1 (64-bit) >> >> locale: >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] ggplot2_0.9.3.1 rtracklayer_1.22.0 GenomicRanges_1.14.3 >> [4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 >> [7] ESSR_1.0.1 >> >> loaded via a namespace (and not attached): >> [1] Biostrings_2.30.1 bitops_1.0-6 BSgenome_1.30.0 >> colorspace_1.2-4 >> [5] compiler_3.0.2 dichromat_2.0-0 digest_0.6.4 grid_3.0.2 >> [9] gtable_0.1.2 labeling_0.2 MASS_7.3-29 >> munsell_0.4.2 >> [13] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 >> RCurl_1.95-4.1 >> [17] reshape2_1.2.2 Rsamtools_1.14.2 scales_0.2.3 >> stats4_3.0.2 >> [21] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 >> zlibbioc_1.8.0 >> >> -- >> Vince Buffalo >> Ross-Ibarra Lab www.rilab.org) >> Plant Sciences, UC Davis >> >> [[alternative HTML version deleted]] >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Great, thanks! Your initial solution did work quite well actually, but it did take me some time to grok. I had to apply it again recently, and decided to take a second look (which is when I think I tweeted about it). In general, I do agree that it would be helpful if there were a reference for these types of tricks. I love GenomicRanges because with some playing around, anything is possible and surprisingly fast. But maybe a higher-level vignette covering efficiency tricks would be helpful? I think if intermediate to advanced learning material peeked under the hood of GenomicRanges a bit, readers might be able to build up their intuition. I find this is the best way to figure how why something may be slow and how to fix it. thanks again, Vince On Fri, Mar 28, 2014 at 11:42 PM, Michael Lawrence < lawrence.michael@gene.com> wrote: > Actually, just realized that 'a' would need to be a RangesList, so it is > more like: > > ## assuming 'a' is in chromosome order (which it probably is) > a$overlap.with.b <- unlist(viewSums(Views(coverage(reduce(b)), as(a, > "RangesList")))) > > It would be nice if we could perform Views aggregations with GRanges. One > possibility would just be view* shortcuts for GRanges, like: > a$overlap.with.b <- viewSums(coverage(reduce(b)), a) > > That gets around the need to represent a Views with GRanges. > > Without Views, you would need to find overlaps and aggregate conditional > on the query index factor, instead of range-based aggregation. This > approach is less natural in my opinion: > > hits <- findOverlaps(a, reduce(b)) > w <- width(ranges(hits, a, b)) > a$overlap.with.b <- sum(relist(w, as(hits, "List"))) > > > > > On Fri, Mar 28, 2014 at 8:24 PM, Michael Lawrence <michafla@gene.com>wrote: > >> So it seems like from twitter that you still haven't solved this. I'm >> pretty sure this is as simple as: >> >> a$overlap.with.b <- viewSums(Views(coverage(reduce(b)), a)) >> >> But maybe I'm not exactly understanding what you want. >> >> Michael >> >> >> >> On Mon, Jan 6, 2014 at 2:52 PM, Vince S. Buffalo <vsbuffalo@gmail.com>wrote: >> >>> Hi Bioconductor folks, >>> >>> I'm trying to create some GRanges summaries, but I think I may be missing >>> an obvious solution. I have fixed-width windows as a GRanges object, and >>> for each window/row I'd like to add a metadata column that indicates how >>> many base pairs of this window overlap features in another GRange object. >>> I'll need to add a few columns for different features in different >>> GRanges >>> objects. >>> >>> I've tried using the approach of findOverlaps, followed by ranges() to >>> extract range widths. This creates an error: "'query' must be a Ranges of >>> length equal to number of queries". I saw in the source >>> that pintersect(query[queryHits(x)], subject[subjectHits(x)]) works too >>> (and does without error). This returns the overlapping ranges, but it'd >>> take a load of data munging to get it into the format I'd like -- it seems >>> like I may be overlooking an easier solution. >>> >>> thanks, >>> Vince >>> >>> !> sessionInfo() >>> R version 3.0.2 (2013-09-25) >>> Platform: x86_64-apple-darwin13.0.1 (64-bit) >>> >>> locale: >>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 >>> >>> attached base packages: >>> [1] parallel stats graphics grDevices utils datasets methods >>> [8] base >>> >>> other attached packages: >>> [1] ggplot2_0.9.3.1 rtracklayer_1.22.0 GenomicRanges_1.14.3 >>> [4] XVector_0.2.0 IRanges_1.20.6 BiocGenerics_0.8.0 >>> [7] ESSR_1.0.1 >>> >>> loaded via a namespace (and not attached): >>> [1] Biostrings_2.30.1 bitops_1.0-6 BSgenome_1.30.0 >>> colorspace_1.2-4 >>> [5] compiler_3.0.2 dichromat_2.0-0 digest_0.6.4 grid_3.0.2 >>> [9] gtable_0.1.2 labeling_0.2 MASS_7.3-29 >>> munsell_0.4.2 >>> [13] plyr_1.8 proto_0.3-10 RColorBrewer_1.0-5 >>> RCurl_1.95-4.1 >>> [17] reshape2_1.2.2 Rsamtools_1.14.2 scales_0.2.3 >>> stats4_3.0.2 >>> [21] stringr_0.6.2 tools_3.0.2 XML_3.98-1.1 >>> zlibbioc_1.8.0 >>> >>> -- >>> Vince Buffalo >>> Ross-Ibarra Lab www.rilab.org) >>> Plant Sciences, UC Davis >>> >>> [[alternative HTML version deleted]] >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > -- Vince Buffalo Ross-Ibarra Lab www.rilab.org) Plant Sciences, UC Davis [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 447 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6