findOveraps suggestions

0

Entering edit mode

Janet Young ▴ 740

@janet-young-2360

Last seen 5.1 years ago

Fred Hutchinson Cancer Research Center,…

Hi, I'm not sure if this is a suggestion for enhancement to IRanges, or a question about whether efficient code already exists to do what I want - I might have missed something. I have a single dataset of genomic regions (as a RangedData object), a minority of which overlap with one another, and I'm interested in looking at which ones overlap. I'll illustrate what I mean using example data from the findOverlaps help page: query <- IRanges(c(1, 4, 9), c(5, 7, 10)) First, a trivial cosmetic comment: findOverlaps(query) works fine (as it knows that I mean subject=query) but if query is a RangedData object, it doesn't work unless I specify both subject and query. query_RD <- RangedData(query,space="chr1") findOverlaps(query_RD) Error in function (classes, fdef, mtable) : unable to find an inherited method for function "findOverlaps", for signature "RangedData", "missing" Instead we need to specify query like so: findOverlaps(query_RD,query_RD) (no big deal, I know, but could be good to fix it for consistency) Second, a truly functional comment. In the special case when query=subject, it would be really nice to have an option not to report self-self matches, by which I mean only the second and third lines from the following example are really interesting: findOverlaps(query) An object of class ?RangesMatching? Slot "matchMatrix": query subject [1,] 1 1 [2,] 1 2 [3,] 2 1 [4,] 2 2 [5,] 3 3 Even nicer would be to only report each symmetrical match once, not twice (i.e. tell me that 1 matches 2, but no need to also tell me that 2 matches 1). I think I can figure out the code to do each of those things the long way around, but it'd be great to have it built in. (is it already?) What do you think? I imagine this could be useful to others too. thanks, Janet Young ------------------------------------------------------------------- Dr. Janet Young (Trask lab) Fred Hutchinson Cancer Research Center 1100 Fairview Avenue N., C3-168, P.O. Box 19024, Seattle, WA 98109-1024, USA. tel: (206) 667 1471 fax: (206) 667 6524 email: jayoung ...at... fhcrc.org http://www.fhcrc.org/labs/trask/

Cancer IRanges Cancer IRanges • 1.1k views

ADD COMMENT • link updated 14.9 years ago by Michael Lawrence ★ 11k • written 14.9 years ago by Janet Young ▴ 740

0

Entering edit mode

Michael Lawrence ★ 11k

@michael-lawrence-3846

Last seen 3.0 years ago

United States

On Thu, Jan 21, 2010 at 12:07 PM, Janet Young <jayoung@fhcrc.org> wrote: > Hi, > > I'm not sure if this is a suggestion for enhancement to IRanges, or a > question > about whether efficient code already exists to do what I want - I might > have > missed something. > > I have a single dataset of genomic regions (as a RangedData object), a > minority > of which overlap with one another, and I'm interested in looking at which > ones > overlap. > > I'll illustrate what I mean using example data from the findOverlaps help > page: > query <- IRanges(c(1, 4, 9), c(5, 7, 10)) > > First, a trivial cosmetic comment: > findOverlaps(query) > works fine (as it knows that I mean subject=query) > > but if query is a RangedData object, it doesn't work unless I specify both > subject and query. > query_RD <- RangedData(query,space="chr1") > findOverlaps(query_RD) > Error in function (classes, fdef, mtable) : > unable to find an inherited method for function "findOverlaps", for > signature > "RangedData", "missing" > > Thanks for pointing this out. > Instead we need to specify query like so: > findOverlaps(query_RD,query_RD) > (no big deal, I know, but could be good to fix it for consistency) > > Second, a truly functional comment. In the special case when > query=subject, it > would be really nice to have an option not to report self-self matches, by > which > I mean only the second and third lines from the following example are > really > interesting: > > findOverlaps(query) > An object of class RangesMatching > Slot "matchMatrix": > query subject > [1,] 1 1 > [2,] 1 2 > [3,] 2 1 > [4,] 2 2 > [5,] 3 3 > > Even nicer would be to only report each symmetrical match once, not twice > (i.e. > tell me that 1 matches 2, but no need to also tell me that 2 matches 1). > > I think I can figure out the code to do each of those things the long way > around, but it'd be great to have it built in. (is it already?) > > What do you think? I imagine this could be useful to others too. > > I think there is code to do this that is commented out in that method. Yes, I found it: ### FIXME: perhaps support a "simplify" option that does this: ## mat <- matchMatrix(result) ## mat <- mat[mat[,1L] != mat[,2L],] ## norm_mat <- cbind(pmin(mat[,1L], mat[,2L]), pmax(mat[,1L], mat[,2L])) ## mat <- mat[!duplicated(norm_mat),] ## result@matchMatrix <- mat So, we could go ahead and add that option. Not sure "simplify" would be a good word. Or would we want two separate options, one to remove the obvious self hits and another to remove the redundant hits? Maybe 'ignoreSelf' and then "normalize" or something? Suggestions welcome. Michael thanks, > > Janet Young > > ------------------------------------------------------------------- > > Dr. Janet Young (Trask lab) > > Fred Hutchinson Cancer Research Center > 1100 Fairview Avenue N., C3-168, > P.O. Box 19024, Seattle, WA 98109-1024, USA. > > tel: (206) 667 1471 fax: (206) 667 6524 > email: jayoung ...at... fhcrc.org > > http://www.fhcrc.org/labs/trask/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

ADD COMMENT • link 14.9 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Thanks Michael, On Jan 21, 2010, at 1:49 PM, Michael Lawrence wrote: > > So, we could go ahead and add that option. that would be great - thank you. > Not sure "simplify" would be a good word. Or would we want two > separate options, one to remove the obvious self hits and another to > remove the redundant hits? I think keeping the two options separate is probably better? > Maybe 'ignoreSelf' and then "normalize" or something? Suggestions > welcome. good - I think ignoreSelf is very clear (and you're right, simplify isn't so good). "normalize" makes me think of other things - maybe not so good. "ignoreSymmetric" perhaps? (only makes sense if query=subject - perhaps there could be a check for that if user tries to invoke that option?) Janet

ADD REPLY • link 14.9 years ago Janet Young ▴ 740

0

Entering edit mode

On Thu, Jan 21, 2010 at 3:04 PM, Janet Young <jayoung@fhcrc.org> wrote: > Thanks Michael, > > > On Jan 21, 2010, at 1:49 PM, Michael Lawrence wrote: > >> >> So, we could go ahead and add that option. >> > that would be great - thank you. > > > Not sure "simplify" would be a good word. Or would we want two separate >> options, one to remove the obvious self hits and another to remove the >> redundant hits? >> > > I think keeping the two options separate is probably better? > > > Maybe 'ignoreSelf' and then "normalize" or something? Suggestions welcome. >> > > good - I think ignoreSelf is very clear (and you're right, simplify isn't > so good). "normalize" makes me think of other things - maybe not so good. > "ignoreSymmetric" perhaps? (only makes sense if query=subject - perhaps > there could be a check for that if user tries to invoke that option?) > > I don't know about ignoreSymmetric, since we're only ignoring the redundancy, not the matching itself. Maybe ignoreRedundant? Also, these parameters will only be for the method where the subject is missing, so we do not have to do any special checks. Michael > Janet > > > [[alternative HTML version deleted]]

ADD REPLY • link 14.9 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Excellent. I think ignoreRedundant makes sense. thanks! On Jan 21, 2010, at 4:25 PM, Michael Lawrence wrote: > > > On Thu, Jan 21, 2010 at 3:04 PM, Janet Young <jayoung at="" fhcrc.org=""> > wrote: > Thanks Michael, > > > On Jan 21, 2010, at 1:49 PM, Michael Lawrence wrote: > > So, we could go ahead and add that option. > that would be great - thank you. > > > Not sure "simplify" would be a good word. Or would we want two > separate options, one to remove the obvious self hits and another to > remove the redundant hits? > > I think keeping the two options separate is probably better? > > > Maybe 'ignoreSelf' and then "normalize" or something? Suggestions > welcome. > > good - I think ignoreSelf is very clear (and you're right, simplify > isn't so good). "normalize" makes me think of other things - maybe > not so good. "ignoreSymmetric" perhaps? (only makes sense if > query=subject - perhaps there could be a check for that if user > tries to invoke that option?) > > > I don't know about ignoreSymmetric, since we're only ignoring the > redundancy, not the matching itself. Maybe ignoreRedundant? > > Also, these parameters will only be for the method where the subject > is missing, so we do not have to do any special checks. > > Michael > Janet > > >

ADD REPLY • link 14.9 years ago Janet Young ▴ 740

Login before adding your answer.