Dear all,
following up on some previous posts (below), I was wondering if there is any other (simpler) way to intersect 2 GRanges and keep all the information in the metacolumns (perhaps using any newer package/newer operations in BioC ?);
https://stat.ethz.ch/pipermail/bioconductor/2013-August/054504.html
[GenomicRanges] subsetByOverlaps to keep info from both GRanges objects?
The example I have is :
library("GenomicRanges")
## DNAse gr2 <- GRanges("chr1", IRanges(8:12, width=5, names=paste0("dnase:", letters[1:5])), score2=10:14) ## SNP gr1 <- GRanges("chr1", IRanges(1:5, width=5, names=paste0("rsid:", letters[1:5])), score1=1:5) ## interaction : ranges <- subsetByOverlaps(gr2, gr1) hits <- findOverlaps(gr2, gr1) idx <- unique(subjectHits(hits))
## idx <- subjectHits(hits)
values <- DataFrame(rsid=names(gr1)[idx], snpscore=gr1$score1[idx]) mcols(ranges) <- c(mcols(ranges), values) ranges
thank you very much !
Hi,
Based on the code you show, and the links you provide, this is really about
subsetByOverlaps()
and not so much about intersecting ranges right? Intersecting GRanges objects (withintersect()
) is a very different operation. Could the title of your post reflect this by containing subsetByOverlaps instead of intersecting? This way other people interested in this functionality will be able to find it.So it sounds that you are after an enhanced version of
subsetByOverlaps()
that propagates the metadata columns of the 2nd argument (ranges
) to the result. After running your code, I end up with:Note that range
dnase:a
ingr2
overlaps with 2 ranges ingr1
(rsid:d
andrsid:e
) but you've somehow made the decision to report only its overlap withrsid:d
. Is that itended? If yes, what criteria you used to make this choice? If not, what would you expect the output to look like?Also your 2 GRanges objects share the same metadata column (score). Is that on purpose? That's another difficulty that an enhanced version of
subsetByOverlaps()
would need to deal with.H.
Dear Herve, you are so fast ! many thanks for your help ! I am going now over your comments ... have a good night !
Dear Herve,
please, could you let me know, how would you write/re-write the R code (above) in order to keep :
-- only the GRanges that overlap
-- for the GRanges that overlap, to keep all the info/metadata from both gr2 and gr1
gr2 <- GRanges("chr1", IRanges(8:12, width=5, names=paste0("dnase:", letters[1:5])), score2=10:14)
gr1 <- GRanges("chr1", IRanges(1:5, width=5, names=paste0("rsid:", letters[1:5])), score1=1:5)
and, yes, you are right, we shall report all the overlaps (including
dnase:a
ingr2
overlaps with 2 ranges ingr1
(rsid:d
andrsid:e
).thanks a lot !
So how would you like the output to look like? Can you forge the output "by hand" (for input
gr2
andgr1
) and show it here so I can see/understand what you would expect this enhancedsubsetByOverlaps()
to return?Note that GRanges objects are not particularly good/convenient for representing one-to-many relationships (it requires storing list-like objects in the metadata columns). Have you considered using a GRangesList object parallel to
gr2
for the output? Before writing any code, it's always worth spending some time thinking about the best way to represent the output.H.
Dear Herve, thank you for your help.
I was looking more into having a "bedtools" like functionality in subsetByOverlaps, i.e. being able to do the overlap in one command, having the option of keeping all the metadata in these GRanges gr1 and gr2.