Intersecting two chromosomal ranges but keeping the unique ones
1
0
Entering edit mode
Yadav Sapkota ▴ 130
@yadav-sapkota-5156
Last seen 10.1 years ago
Hi, I am trying to create a redundant list from two different lists of chromosomal ranges as follow: *List 1:* Chr Start Stop chr1 0 98595 chr1 17012368 17012439 chr1 17017304 17029919 chr1 17246620 17250887 *List 2:* Chr Start Stop chr1 82483 98595 chr1 17225296 17232195 chr1 17235775 17236214 chr1 17246096 17257401 >From these two lists, I want to create one list by (i) intersecting the regions if they overlap between two lists but (ii) also keeping the ones unique in either of the lists. So far, I can do the intersection using GRanges (suggested by Martin Morgan) but can not get the unique regions only present in either groups. I am using the folowing code: source("http://bioconductor.org/biocLite.R") biocLite("GenomicRanges") library(GenomicRanges) rel=read.csv('file1.csv', header=T, sep=',') nonrel=read.csv(file2.csv', header=T, sep=',') subject=GRanges(rel$Chr, IRanges(rel$Start, rel$Stop)) Query=GRanges(nonrel$Chr, IRanges(nonrel$Start, nonrel$Stop)) result=as.data.frame(intersect (subject,Query)) write.table(result, file='Result.csv', sep=',') Could anyone tell me how to insert a piece of code to keep the unique regions in either of the lists? Regards, --Yadav [[alternative HTML version deleted]]
• 1.4k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 19 months ago
United States
Hi, On Mon, Mar 12, 2012 at 11:41 AM, Yadav Sapkota <ysapkota at="" ualberta.ca=""> wrote: > Hi, > > I am trying to create a redundant list from two different lists of > chromosomal ranges as follow: > > *List 1:* > ? Chr Start Stop ?chr1 0 98595 ?chr1 17012368 17012439 ?chr1 17017304 > 17029919 ?chr1 17246620 17250887 > *List 2:* > ? Chr Start Stop ?chr1 82483 98595 ?chr1 17225296 17232195 ?chr1 17235775 > 17236214 ?chr1 17246096 17257401 > > > >From these two lists, I want to create one list by (i) intersecting the > regions if they overlap between two lists but (ii) also keeping the ones > unique in either of the lists. So far, I can do the intersection using > GRanges (suggested by Martin Morgan) but can not get the unique regions > only present in either groups. I am using the folowing code: > > source("http://bioconductor.org/biocLite.R") > biocLite("GenomicRanges") > library(GenomicRanges) > rel=read.csv('file1.csv', header=T, sep=',') > nonrel=read.csv(file2.csv', header=T, sep=',') > subject=GRanges(rel$Chr, IRanges(rel$Start, rel$Stop)) > Query=GRanges(nonrel$Chr, IRanges(nonrel$Start, nonrel$Stop)) > result=as.data.frame(intersect (subject,Query)) > write.table(result, file='Result.csv', sep=',') > > Could anyone tell me how to insert a piece of code to keep the unique > regions in either of the lists? Sure, skim through the ?findOverlaps docs as well as its `see also` section. For instance, a "low level" way to see which regions in subject don't appear in query: subject.only <- countOverlaps(subject, Query) == 0 But `match` and `%in%` are overloaded to work with *Ranges-like objects, so you can play around with those to see which ones you prefer. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD COMMENT

Login before adding your answer.

Traffic: 1085 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6