Question

Intersecting two chromosomal ranges but keeping the unique ones

0

Entering edit mode

Yadav Sapkota ▴ 130

@yadav-sapkota-5156

Last seen 10.3 years ago

Hi, I am trying to create a redundant list from two different lists of chromosomal ranges as follow: *List 1:* Chr Start Stop chr1 0 98595 chr1 17012368 17012439 chr1 17017304 17029919 chr1 17246620 17250887 *List 2:* Chr Start Stop chr1 82483 98595 chr1 17225296 17232195 chr1 17235775 17236214 chr1 17246096 17257401 >From these two lists, I want to create one list by (i) intersecting the regions if they overlap between two lists but (ii) also keeping the ones unique in either of the lists. So far, I can do the intersection using GRanges (suggested by Martin Morgan) but can not get the unique regions only present in either groups. I am using the folowing code: source("http://bioconductor.org/biocLite.R") biocLite("GenomicRanges") library(GenomicRanges) rel=read.csv('file1.csv', header=T, sep=',') nonrel=read.csv(file2.csv', header=T, sep=',') subject=GRanges(rel$Chr, IRanges(rel$Start, rel$Stop)) Query=GRanges(nonrel$Chr, IRanges(nonrel$Start, nonrel$Stop)) result=as.data.frame(intersect (subject,Query)) write.table(result, file='Result.csv', sep=',') Could anyone tell me how to insert a piece of code to keep the unique regions in either of the lists? Regards, --Yadav [[alternative HTML version deleted]]

• 1.5k views

ADD COMMENT • link updated 12.8 years ago by Steve Lianoglou ★ 13k • written 12.8 years ago by Yadav Sapkota ▴ 130

score 0 · Answer 1 · 2012-03-12

Hi, On Mon, Mar 12, 2012 at 11:41 AM, Yadav Sapkota <ysapkota at="" ualberta.ca=""> wrote: > Hi, > > I am trying to create a redundant list from two different lists of > chromosomal ranges as follow: > > *List 1:* > ? Chr Start Stop ?chr1 0 98595 ?chr1 17012368 17012439 ?chr1 17017304 > 17029919 ?chr1 17246620 17250887 > *List 2:* > ? Chr Start Stop ?chr1 82483 98595 ?chr1 17225296 17232195 ?chr1 17235775 > 17236214 ?chr1 17246096 17257401 > > > >From these two lists, I want to create one list by (i) intersecting the > regions if they overlap between two lists but (ii) also keeping the ones > unique in either of the lists. So far, I can do the intersection using > GRanges (suggested by Martin Morgan) but can not get the unique regions > only present in either groups. I am using the folowing code: > > source("http://bioconductor.org/biocLite.R") > biocLite("GenomicRanges") > library(GenomicRanges) > rel=read.csv('file1.csv', header=T, sep=',') > nonrel=read.csv(file2.csv', header=T, sep=',') > subject=GRanges(rel$Chr, IRanges(rel$Start, rel$Stop)) > Query=GRanges(nonrel$Chr, IRanges(nonrel$Start, nonrel$Stop)) > result=as.data.frame(intersect (subject,Query)) > write.table(result, file='Result.csv', sep=',') > > Could anyone tell me how to insert a piece of code to keep the unique > regions in either of the lists? Sure, skim through the ?findOverlaps docs as well as its `see also` section. For instance, a "low level" way to see which regions in subject don't appear in query: subject.only <- countOverlaps(subject, Query) == 0 But `match` and `%in%` are overloaded to work with *Ranges-like objects, so you can play around with those to see which ones you prefer. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact