Is a number within a set of ranges?
7
0
Entering edit mode
Daniel Brewer ★ 1.9k
@daniel-brewer-1791
Last seen 10.4 years ago
I have a table with a start and stop column which defines a set of ranges. I have another table with a list of genes with associated position. What I would like to do is subset the gene table so it only contains genes whose position is within any of the ranges. What is the best way to do this? The only way I can think of is to construct a long list of conditions linked by ORs but I am sure there must be a better way. Simple example: Start Stop 1 3 5 9 13 15 Gene Position 1 14 2 4 3 10 4 6 I would like to get out: Gene Position 1 14 4 6 Any ideas? Thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer Research Email: daniel.brewer at icr.ac.uk ************************************************************** The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
Cancer Cancer • 2.2k views
ADD COMMENT
0
Entering edit mode
Artur Veloso ▴ 340
@artur-veloso-2062
Last seen 10.4 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071029/ ed7b606c/attachment.pl
ADD COMMENT
0
Entering edit mode
You can use cut (?cut) defining the breaks from your ranges, as they are non-overlapping. Regards, Carlos J. Gil Bellosta http://www.datanalytics.com > On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > > > > I have a table with a start and stop column which defines a set of > > ranges. I have another table with a list of genes with associated > > position. What I would like to do is subset the gene table so it only > > contains genes whose position is within any of the ranges. What is the > > best way to do this? The only way I can think of is to construct a long > > list of conditions linked by ORs but I am sure there must be a better way. > > > > Simple example: > > > > Start Stop > > 1 3 > > 5 9 > > 13 15 > > > > Gene Position > > 1 14 > > 2 4 > > 3 10 > > 4 6 > > > > I would like to get out: > > Gene Position > > 1 14 > > 4 6 > > > > Any ideas? > > > > Thanks > > > > Dan > > > > -- > > ************************************************************** > > Daniel Brewer, Ph.D. > > Institute of Cancer Research > > Email: daniel.brewer at icr.ac.uk > > ************************************************************** > > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > > Company Limited by Guarantee, Registered in England under Company No. 534147 > > with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > > > This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
You would like to avoid loops here, especially nested loops: this is what apply, sapply etc are for. Using your syntax: final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & x[2]<=place$end)) - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: > Hi Daniel, > > I'm very new to R and I'm far from a good programmer, but I think that this > small script should solve your problem. Well, at least for the example you > provided it worked. I hope it helps. > > Cheers, > > Artur > > > start <- c(1,5,13) > > stop <- c(3,9,15) > > place <- data.frame(start,stop) > > > > gene <- c(1,2,3,4) > > position <- c(14,4,10,6) > > position <- data.frame(gene,position) > > > > range <- list() > > for(a in 1:dim(place)[1]) > + range[[a]] <- seq(place$start[a],place$stop[a]) > > > > presence <- NULL > > final.presence <- NULL > > for(b in position$position) > + { > + for(c in 1:length(range)) > + { > + presence <- c(presence,b%in%range[[c]]) > + } > + final.presence <- c(final.presence,as.logical(sum(presence))) > + presence <- NULL > + } > > > > position[final.presence,] > gene position > 1 1 14 > 4 4 6 > > > On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > > > > I have a table with a start and stop column which defines a set of > > ranges. I have another table with a list of genes with associated > > position. What I would like to do is subset the gene table so it only > > contains genes whose position is within any of the ranges. What is the > > best way to do this? The only way I can think of is to construct a long > > list of conditions linked by ORs but I am sure there must be a better way. > > > > Simple example: > > > > Start Stop > > 1 3 > > 5 9 > > 13 15 > > > > Gene Position > > 1 14 > > 2 4 > > 3 10 > > 4 6 > > > > I would like to get out: > > Gene Position > > 1 14 > > 4 6 > > > > Any ideas? > > > > Thanks > > > > Dan > > > > -- > > ************************************************************** > > Daniel Brewer, Ph.D. > > Institute of Cancer Research > > Email: daniel.brewer at icr.ac.uk > > ************************************************************** > > > > The Institute of Cancer Research: Royal Cancer Hospital, a charitable > > Company Limited by Guarantee, Registered in England under Company No. 534147 > > with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > > > > This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
In this case you don't gain much if anything by using apply(), which is just a nice wrapper to a for() loop (and the bad rap that for loops have in R isn't really applicable these days). The real gain to be had is from vectorizing the comparison. Best, Jim Oleg Sklyar wrote: > You would like to avoid loops here, especially nested loops: this is > what apply, sapply etc are for. Using your syntax: > > final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & > x[2]<=place$end)) > > - > Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 > > > On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: >> Hi Daniel, >> >> I'm very new to R and I'm far from a good programmer, but I think that this >> small script should solve your problem. Well, at least for the example you >> provided it worked. I hope it helps. >> >> Cheers, >> >> Artur >> >>> start <- c(1,5,13) >>> stop <- c(3,9,15) >>> place <- data.frame(start,stop) >>> >>> gene <- c(1,2,3,4) >>> position <- c(14,4,10,6) >>> position <- data.frame(gene,position) >>> >>> range <- list() >>> for(a in 1:dim(place)[1]) >> + range[[a]] <- seq(place$start[a],place$stop[a]) >>> presence <- NULL >>> final.presence <- NULL >>> for(b in position$position) >> + { >> + for(c in 1:length(range)) >> + { >> + presence <- c(presence,b%in%range[[c]]) >> + } >> + final.presence <- c(final.presence,as.logical(sum(presence))) >> + presence <- NULL >> + } >>> position[final.presence,] >> gene position >> 1 1 14 >> 4 4 6 >> >> >> On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: >>> I have a table with a start and stop column which defines a set of >>> ranges. I have another table with a list of genes with associated >>> position. What I would like to do is subset the gene table so it only >>> contains genes whose position is within any of the ranges. What is the >>> best way to do this? The only way I can think of is to construct a long >>> list of conditions linked by ORs but I am sure there must be a better way. >>> >>> Simple example: >>> >>> Start Stop >>> 1 3 >>> 5 9 >>> 13 15 >>> >>> Gene Position >>> 1 14 >>> 2 4 >>> 3 10 >>> 4 6 >>> >>> I would like to get out: >>> Gene Position >>> 1 14 >>> 4 6 >>> >>> Any ideas? >>> >>> Thanks >>> >>> Dan >>> >>> -- >>> ************************************************************** >>> Daniel Brewer, Ph.D. >>> Institute of Cancer Research >>> Email: daniel.brewer at icr.ac.uk >>> ************************************************************** >>> >>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable >>> Company Limited by Guarantee, Registered in England under Company No. 534147 >>> with its Registered Office at 123 Old Brompton Road, London SW7 3RP. >>> >>> This e-mail message is confidential and for use by the...{{dropped:13}} >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLY
0
Entering edit mode
It's about both, and in fact after scrolling down I noticed that we came up with exactly the same solution :) - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 16:44 -0400, James W. MacDonald wrote: > In this case you don't gain much if anything by using apply(), which is > just a nice wrapper to a for() loop (and the bad rap that for loops have > in R isn't really applicable these days). > > The real gain to be had is from vectorizing the comparison. > > Best, > > Jim > > > > Oleg Sklyar wrote: > > You would like to avoid loops here, especially nested loops: this is > > what apply, sapply etc are for. Using your syntax: > > > > final.presence = apply(gene, 1, function(x) any(x[2]>=place$start & > > x[2]<=place$end)) > > > > - > > Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 > > > > > > On Mon, 2007-10-29 at 12:42 -0500, Artur Veloso wrote: > >> Hi Daniel, > >> > >> I'm very new to R and I'm far from a good programmer, but I think that this > >> small script should solve your problem. Well, at least for the example you > >> provided it worked. I hope it helps. > >> > >> Cheers, > >> > >> Artur > >> > >>> start <- c(1,5,13) > >>> stop <- c(3,9,15) > >>> place <- data.frame(start,stop) > >>> > >>> gene <- c(1,2,3,4) > >>> position <- c(14,4,10,6) > >>> position <- data.frame(gene,position) > >>> > >>> range <- list() > >>> for(a in 1:dim(place)[1]) > >> + range[[a]] <- seq(place$start[a],place$stop[a]) > >>> presence <- NULL > >>> final.presence <- NULL > >>> for(b in position$position) > >> + { > >> + for(c in 1:length(range)) > >> + { > >> + presence <- c(presence,b%in%range[[c]]) > >> + } > >> + final.presence <- c(final.presence,as.logical(sum(presence))) > >> + presence <- NULL > >> + } > >>> position[final.presence,] > >> gene position > >> 1 1 14 > >> 4 4 6 > >> > >> > >> On 10/29/07, Daniel Brewer <daniel.brewer at="" icr.ac.uk=""> wrote: > >>> I have a table with a start and stop column which defines a set of > >>> ranges. I have another table with a list of genes with associated > >>> position. What I would like to do is subset the gene table so it only > >>> contains genes whose position is within any of the ranges. What is the > >>> best way to do this? The only way I can think of is to construct a long > >>> list of conditions linked by ORs but I am sure there must be a better way. > >>> > >>> Simple example: > >>> > >>> Start Stop > >>> 1 3 > >>> 5 9 > >>> 13 15 > >>> > >>> Gene Position > >>> 1 14 > >>> 2 4 > >>> 3 10 > >>> 4 6 > >>> > >>> I would like to get out: > >>> Gene Position > >>> 1 14 > >>> 4 6 > >>> > >>> Any ideas? > >>> > >>> Thanks > >>> > >>> Dan > >>> > >>> -- > >>> ************************************************************** > >>> Daniel Brewer, Ph.D. > >>> Institute of Cancer Research > >>> Email: daniel.brewer at icr.ac.uk > >>> ************************************************************** > >>> > >>> The Institute of Cancer Research: Royal Cancer Hospital, a charitable > >>> Company Limited by Guarantee, Registered in England under Company No. 534147 > >>> with its Registered Office at 123 Old Brompton Road, London SW7 3RP. > >>> > >>> This e-mail message is confidential and for use by the...{{dropped:13}} > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at stat.math.ethz.ch > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
@sean-davis-490
Last seen 5 months ago
United States
Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? Here is a function that I use for finding overlapping segments. It takes two data.frames, x and y. Each must have "Chr", "Position", and "end" columns (often used in conjunction with snapCGH--hence, the Position rather than "start"). The "shift" parameter is a convenience function for doing "random shift" random distributions of genomic segments. The function returns the indexes of x and y that overlap. So, if the first row of the x data.frame overlaps with the first 3 rows of y, the output will be: Xindex Yindex 1 1 1 2 1 3 Note that the data.frames can have more than those three columns, but those three columns MUST be present and named as mentioned. Hope this helps. Sean Attached function below ----------------------- findOverlappingSegments <- function(x,y,shift=0) { swap <- nrow(x)<nrow(y) #="" want="" to="" have="" larger="" set="" first="" for="" speed="" if(swap)="" {="" tmpx="" <-="" x="" x="" <-="" y="" y="" <-="" tmpx="" }="" intersectchrom="" <-="" intersect(x$chr,y$chr)="" ret="" <-="" list()="" for(i="" in="" intersectchrom)="" {="" aindex="" <-="" which(y$chr="=i)" bindex="" <-="" which(x$chr="=i)" a="" <-="" y[aindex,]="" b="" <-="" x[bindex,]="" overlapsbrow="" <-="" mapply(function(astart,="" aend)="" {="" which((astart="" <="b$end" &="" astart="">=b$Position) | (Aend <= b$end & Aend>=b$Position) | (Astart <= b$Position & Aend>=b$end) | (Astart >= b$Position & Aend<=b$end)) },a$Position+shift,a$end+shift) tmp1 <- unlist(overlapsBrow) xindex <- bindex[tmp1] yindex <- aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))] if(swap) { ret[[i]]<- cbind(yindex,xindex) } else { ret[[i]] <- cbind(xindex,yindex) } colnames(ret[[i]]) <- c('Xindex','Yindex') } return(do.call(rbind,ret)) }
ADD COMMENT
0
Entering edit mode
Or a more simplistic alternative that will work with the data provided: > mat <- matrix(c(1,5,13,3,9,15), ncol=2) > gn <- matrix(c(14,4,10,6), ncol=1) > a <- apply(gn, 1, function(x) any(x > mat[,1] & x < mat[,2])) > gn[a,] [1] 14 6 Best, Jim Sean Davis wrote: > Daniel Brewer wrote: >> I have a table with a start and stop column which defines a set of >> ranges. I have another table with a list of genes with associated >> position. What I would like to do is subset the gene table so it only >> contains genes whose position is within any of the ranges. What is the >> best way to do this? The only way I can think of is to construct a long >> list of conditions linked by ORs but I am sure there must be a better way. >> >> Simple example: >> >> Start Stop >> 1 3 >> 5 9 >> 13 15 >> >> Gene Position >> 1 14 >> 2 4 >> 3 10 >> 4 6 >> >> I would like to get out: >> Gene Position >> 1 14 >> 4 6 >> >> Any ideas? > > Here is a function that I use for finding overlapping segments. It > takes two data.frames, x and y. Each must have "Chr", "Position", and > "end" columns (often used in conjunction with snapCGH--hence, the > Position rather than "start"). The "shift" parameter is a convenience > function for doing "random shift" random distributions of genomic > segments. The function returns the indexes of x and y that overlap. > So, if the first row of the x data.frame overlaps with the first 3 rows > of y, the output will be: > > Xindex Yindex > 1 1 > 1 2 > 1 3 > > Note that the data.frames can have more than those three columns, but > those three columns MUST be present and named as mentioned. > > Hope this helps. > > Sean > > Attached function below > ----------------------- > > findOverlappingSegments <- > function(x,y,shift=0) { > swap <- nrow(x)<nrow(y) #="" want="" to="" have="" larger="" set="" first="" for="" speed=""> if(swap) { > tmpx <- x > x <- y > y <- tmpx > } > intersectChrom <- intersect(x$Chr,y$Chr) > ret <- list() > for(i in intersectChrom) { > aindex <- which(y$Chr==i) > bindex <- which(x$Chr==i) > a <- y[aindex,] > b <- x[bindex,] > overlapsBrow <- mapply(function(Astart, Aend) { > which((Astart <= b$end & Astart>=b$Position) | > (Aend <= b$end & Aend>=b$Position) | > (Astart <= b$Position & Aend>=b$end) | > (Astart >= b$Position & Aend<=b$end)) > },a$Position+shift,a$end+shift) > tmp1 <- unlist(overlapsBrow) > xindex <- bindex[tmp1] > yindex <- > aindex[rep(1:nrow(a),sapply(overlapsBrow,length,simplify=TRUE))] > if(swap) { > ret[[i]]<- cbind(yindex,xindex) > } else { > ret[[i]] <- cbind(xindex,yindex) > } > colnames(ret[[i]]) <- c('Xindex','Yindex') > } > return(do.call(rbind,ret)) > } > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 15 hours ago
United States
Hi Dan, Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. Are you not telling us something here? Because the problem as stated is very simple. Say your matrix below is called mat: index <- mat[,1] < 6 & mat[,2] < 15 Or do you have a whole bunch of ranges to test? Best, Jim > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > -- James W. MacDonald, M.S. Biostatistician Affymetrix and cDNA Microarray Core University of Michigan Cancer Center 1500 E. Medical Center Drive 7410 CCGC Ann Arbor MI 48109 734-647-5623
ADD COMMENT
0
Entering edit mode
@christos-hatzis-1614
Last seen 10.4 years ago
> pos <- matrix(c(1, 5, 13, 3, 9, 15), ncol=2) pos [,1] [,2] [1,] 1 3 [2,] 5 9 [3,] 13 15 > gene.pos <- c(14,4,10,6) > gene.pos [1] 14 4 10 6 > within <- sapply(gene.pos, function(g) any(apply(pos, 1, function(x) findInterval(g, x)) == 1)) > gene.pos[within] [1] 14 6 Look at ?findInterval, which does all the work. It returns 1 if within range in this case. -Christos > -----Original Message----- > From: bioconductor-bounces at stat.math.ethz.ch > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of > Daniel Brewer > Sent: Monday, October 29, 2007 12:29 PM > To: bioconductor at stat.math.ethz.ch > Subject: [BioC] Is a number within a set of ranges? > > I have a table with a start and stop column which defines a > set of ranges. I have another table with a list of genes > with associated position. What I would like to do is subset > the gene table so it only contains genes whose position is > within any of the ranges. What is the best way to do this? > The only way I can think of is to construct a long list of > conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > > -- > ************************************************************** > Daniel Brewer, Ph.D. > Institute of Cancer Research > Email: daniel.brewer at icr.ac.uk > ************************************************************** > > The Institute of Cancer Research: Royal Cancer Hospital, a > charitable Company Limited by Guarantee, Registered in > England under Company No. 534147 with its Registered Office > at 123 Old Brompton Road, London SW7 3RP. > > This e-mail message is confidential and for use by the...{{dropped:13}}
ADD COMMENT
0
Entering edit mode
Christos Hatzis wrote: >> pos <- matrix(c(1, 5, 13, 3, 9, 15), ncol=2) pos > [,1] [,2] > [1,] 1 3 > [2,] 5 9 > [3,] 13 15 >> gene.pos <- c(14,4,10,6) >> gene.pos > [1] 14 4 10 6 > >> within <- sapply(gene.pos, function(g) any(apply(pos, 1, function(x) > findInterval(g, x)) == 1)) > >> gene.pos[within] > [1] 14 6 Good to know the existence of findInterval(). Thanks! For this particular case though, I would be tempted to keep things simple by replacing this any(apply(pos, 1, function(x) findInterval(g, x)) == 1) by any(apply(pos, 1, function(x) x[1] <= g && g <= x[2])) Not only is the later easier to understand, but with the former, you'll get wrong results if one of your genes is positioned at one of the Stop positions: gene.pos <- c(14,4,10,6,15) # last gene is at a Stop position # using findInterval() gives: > within [1] TRUE FALSE FALSE TRUE FALSE # using 'x[1] <= g && g <= x[2]' gives: > within [1] TRUE FALSE FALSE TRUE TRUE Note that the "findInterval" method can be fixed by specifying 'rightmost.closed=TRUE' but this doesn't make the code easier to understand, all the contrary... Cheers, H. > > Look at ?findInterval, which does all the work. It returns 1 if within > range in this case. > > -Christos > >> -----Original Message----- >> From: bioconductor-bounces at stat.math.ethz.ch >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of >> Daniel Brewer >> Sent: Monday, October 29, 2007 12:29 PM >> To: bioconductor at stat.math.ethz.ch >> Subject: [BioC] Is a number within a set of ranges? >> >> I have a table with a start and stop column which defines a >> set of ranges. I have another table with a list of genes >> with associated position. What I would like to do is subset >> the gene table so it only contains genes whose position is >> within any of the ranges. What is the best way to do this? >> The only way I can think of is to construct a long list of >> conditions linked by ORs but I am sure there must be a better way. >> >> Simple example: >> >> Start Stop >> 1 3 >> 5 9 >> 13 15 >> >> Gene Position >> 1 14 >> 2 4 >> 3 10 >> 4 6 >> >> I would like to get out: >> Gene Position >> 1 14 >> 4 6 >> >> Any ideas? >> >> Thanks >> >> Dan >> >> -- >> ************************************************************** >> Daniel Brewer, Ph.D. >> Institute of Cancer Research >> Email: daniel.brewer at icr.ac.uk >> ************************************************************** >> >> The Institute of Cancer Research: Royal Cancer Hospital, a >> charitable Company Limited by Guarantee, Registered in >> England under Company No. 534147 with its Registered Office >> at 123 Old Brompton Road, London SW7 3RP. >> >> This e-mail message is confidential and for use by the...{{dropped:13}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
@joern-toedling-1244
Last seen 10.4 years ago
Hi Daniel, I think you could do something smarter using the "outer" function here. Let's say, your matrix of intervals be "ints" and the Position column of your genes-position matrix be pos, then something like this, should give you only the positions of those genes inside those intervals: pos[which(rowSums(outer(pos,ints[,"Stop"],"<=") & outer(pos,ints[,"Start"],">=") )>0)] Maybe there's even a smarter way that I do not know of. Regards, Joern Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > >
ADD COMMENT
0
Entering edit mode
Oleg Sklyar ▴ 260
@oleg-sklyar-1882
Last seen 10.4 years ago
This is a trivial one-liner: r = data.frame(Start=c(1,5,13), End=c(3,9,15)) g = data.frame(Gene=c(1,2,3,4), Position=c(14,4,10,6)) index = apply(g, 1, function(x) any(x[2]>=r$Start & x[2]<=r$End)) > index [1] TRUE FALSE FALSE TRUE > g[index,] Gene Position 1 1 14 4 4 6 Best, Oleg - Dr Oleg Sklyar * EMBL-EBI, Cambridge CB10 1SD, UK * +441223494466 On Mon, 2007-10-29 at 16:29 +0000, Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan >
ADD COMMENT
0
Entering edit mode
Daniel Brewer ★ 1.9k
@daniel-brewer-1791
Last seen 10.4 years ago
Daniel Brewer wrote: > I have a table with a start and stop column which defines a set of > ranges. I have another table with a list of genes with associated > position. What I would like to do is subset the gene table so it only > contains genes whose position is within any of the ranges. What is the > best way to do this? The only way I can think of is to construct a long > list of conditions linked by ORs but I am sure there must be a better way. > > Simple example: > > Start Stop > 1 3 > 5 9 > 13 15 > > Gene Position > 1 14 > 2 4 > 3 10 > 4 6 > > I would like to get out: > Gene Position > 1 14 > 4 6 > > Any ideas? > > Thanks > > Dan > Thanks everyone for their ideas. That is marvellous. Dan The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP. This e-mail message is confidential and for use by the a...{{dropped:2}}
ADD COMMENT

Login before adding your answer.

Traffic: 339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6