How to convert from IRanges(List) to Rle(List)
1
0
Entering edit mode
@delhommeemblde-3232
Last seen 10.2 years ago
Hi all, I'm just wondering if there would be a direct way to convert an IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert my IRanges into an integer vector and cast that as an Rle (Rle(as.integer(rng)), but that is not extremely efficient on a long IRangesList (with > 700,000 IRanges in it). Takes ~10 mins with an sapply. Why I want that is for the following: I have an IRangesList of transcripts (describing exons at the genome level) and for every one, I have a bp position at the transcript level that I want to convert into a genomic bp position. Basically, I need to be able to convert a given transcript coordinate into the corresponding genomic coordinate. My IRanges contain the genomic coordinates of every transcript and by converting it into an integer vector, I can select the right genomic bp coordinate by using the transcript bp coordinate as an index (as.integer(rng)[transcript.pos]). I considered the IRanges approach because I keep the transcript name and I'm sure that I looking up the right coord in the right transcript, but I'm open to other suggestions. Thanks for any pointers, Cheers, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany
convert IRanges convert IRanges • 2.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States
On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: > Hi all, > > I'm just wondering if there would be a direct way to convert an > IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert > my IRanges into an integer vector and cast that as an Rle > (Rle(as.integer(rng)), but that is not extremely efficient on a long > IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an > sapply. > > Why I want that is for the following: I have an IRangesList of > transcripts (describing exons at the genome level) and for every one, > I have a bp position at the transcript level that I want to convert > into a genomic bp position. Basically, I need to be able to convert a > given transcript coordinate into the corresponding genomic > coordinate. My IRanges contain the genomic coordinates of every > transcript and by converting it into an integer vector, I can select > the right genomic bp coordinate by using the transcript bp coordinate > as an index (as.integer(rng)[transcript.pos]). > > I considered the IRanges approach because I keep the transcript name > and I'm sure that I looking up the right coord in the right > transcript, but I'm open to other suggestions. Hi Nico -- VariantAnnotation::refLocsToLocalLocs, GenomicFeatures::transcriptLocs2refLocs and IRanges::map might do this for you; no direct experience on my part, though. Martin > > Thanks for any pointers, > > Cheers, > > Nico > > --------------------------------------------------------------- > Nicolas Delhomme > > Genome Biology Computational Support > > European Molecular Biology Laboratory > > Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de > Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany > > _______________________________________________ Bioconductor mailing > list Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor Search the > archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
ADD COMMENT
0
Entering edit mode
On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan <mtmorgan@fhcrc.org> wrote: > On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: > >> Hi all, >> >> I'm just wondering if there would be a direct way to convert an >> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert >> my IRanges into an integer vector and cast that as an Rle >> (Rle(as.integer(rng)), but that is not extremely efficient on a long >> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an >> sapply. >> >> Why I want that is for the following: I have an IRangesList of >> transcripts (describing exons at the genome level) and for every one, >> I have a bp position at the transcript level that I want to convert >> into a genomic bp position. Basically, I need to be able to convert a >> given transcript coordinate into the corresponding genomic >> coordinate. My IRanges contain the genomic coordinates of every >> transcript and by converting it into an integer vector, I can select >> the right genomic bp coordinate by using the transcript bp coordinate >> as an index (as.integer(rng)[transcript.**pos]). >> >> I considered the IRanges approach because I keep the transcript name >> and I'm sure that I looking up the right coord in the right >> transcript, but I'm open to other suggestions. >> > > Hi Nico -- VariantAnnotation::**refLocsToLocalLocs, GenomicFeatures::**transcriptLocs2refLocs > and IRanges::map might do this for you; no direct experience on my part, > though. Martin > > Right. Right now, IRanges::map will take things from global to local (either into transcripts or reads, depending on the argument). This takes the place of "refLocsToLocalLocs". What "map" needs to support is the reverse. I think we could do this with either a new function. I am not sure if it should be called reverseMap though, because it's not clear which is forward and which is reverse. Maybe we need mapToGlobal and mapToLocal? Or maybe "absolute" and "relative" are better terms? Btw, we are working on an "easier to use" interface for the transcriptLocsToRefLocs function and that should be integrated with any refactoring/renaming. Let's get a discussion going. Michael > >> Thanks for any pointers, >> >> Cheers, >> >> Nico >> >> ------------------------------**------------------------------**--- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 Email: nicolas.delhomme@embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany >> >> ______________________________**_________________ Bioconductor mailing >> list Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor="">Search the >> archives: >> http://news.gmane.org/gmane.**science.biology.informatics.**conduct or<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On 04/07/12 16:30, Michael Lawrence wrote: > On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan<mtmorgan at="" fhcrc.org=""> wrote: > >> On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: >> >>> Hi all, >>> >>> I'm just wondering if there would be a direct way to convert an >>> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert >>> my IRanges into an integer vector and cast that as an Rle >>> (Rle(as.integer(rng)), but that is not extremely efficient on a long >>> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an >>> sapply. >>> >>> Why I want that is for the following: I have an IRangesList of >>> transcripts (describing exons at the genome level) and for every one, >>> I have a bp position at the transcript level that I want to convert >>> into a genomic bp position. Basically, I need to be able to convert a >>> given transcript coordinate into the corresponding genomic >>> coordinate. My IRanges contain the genomic coordinates of every >>> transcript and by converting it into an integer vector, I can select >>> the right genomic bp coordinate by using the transcript bp coordinate >>> as an index (as.integer(rng)[transcript.**pos]). >>> >>> I considered the IRanges approach because I keep the transcript name >>> and I'm sure that I looking up the right coord in the right >>> transcript, but I'm open to other suggestions. >>> >> Hi Nico -- VariantAnnotation::**refLocsToLocalLocs, GenomicFeatures::**transcriptLocs2refLocs >> and IRanges::map might do this for you; no direct experience on my part, >> though. Martin >> >> > Right. Right now, IRanges::map will take things from global to local > (either into transcripts or reads, depending on the argument). This takes > the place of "refLocsToLocalLocs". What "map" needs to support is the > reverse. I think we could do this with either a new function. I am not sure > if it should be called reverseMap though, because it's not clear which is > forward and which is reverse. Maybe we need mapToGlobal and mapToLocal? Or > maybe "absolute" and "relative" are better terms? > > Btw, we are working on an "easier to use" interface for the > transcriptLocsToRefLocs function and that should be integrated with any > refactoring/renaming. I like the idea of the map generic and where it is going. I think the mapToGlobal and mapToLocal terms are more clear. Assuming in mapToGlobal the 'from' would be along the lines of cDNA-based, cds-based, or protein-based coordinates. In mapToLocal the 'from' would always be genomic-based coordinates. Yes? Valerie > > Let's get a discussion going. > > Michael > > >>> Thanks for any pointers, >>> >>> Cheers, >>> >>> Nico >>> >>> ------------------------------**------------------------------**--- >>> Nicolas Delhomme >>> >>> Genome Biology Computational Support >>> >>> European Molecular Biology Laboratory >>> >>> Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de >>> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany >>> >>> ______________________________**_________________ Bioconductor mailing >>> list Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor="">Search the >>> archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >> >> -- >> Computational Biology >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >> >> Location: M1-B861 >> Telephone: 206 667-2793 >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On Sat, Apr 7, 2012 at 7:31 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > On 04/07/12 16:30, Michael Lawrence wrote: > >> On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan<mtmorgan@fhcrc.org> >> wrote: >> >> On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: >>> >>> Hi all, >>>> >>>> I'm just wondering if there would be a direct way to convert an >>>> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert >>>> my IRanges into an integer vector and cast that as an Rle >>>> (Rle(as.integer(rng)), but that is not extremely efficient on a long >>>> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an >>>> sapply. >>>> >>>> Why I want that is for the following: I have an IRangesList of >>>> transcripts (describing exons at the genome level) and for every one, >>>> I have a bp position at the transcript level that I want to convert >>>> into a genomic bp position. Basically, I need to be able to convert a >>>> given transcript coordinate into the corresponding genomic >>>> coordinate. My IRanges contain the genomic coordinates of every >>>> transcript and by converting it into an integer vector, I can select >>>> the right genomic bp coordinate by using the transcript bp coordinate >>>> as an index (as.integer(rng)[transcript.****pos]). >>>> >>>> >>>> I considered the IRanges approach because I keep the transcript name >>>> and I'm sure that I looking up the right coord in the right >>>> transcript, but I'm open to other suggestions. >>>> >>>> Hi Nico -- VariantAnnotation::****refLocsToLocalLocs, >>> GenomicFeatures::****transcriptLocs2refLocs >>> >>> and IRanges::map might do this for you; no direct experience on my part, >>> though. Martin >>> >>> >>> Right. Right now, IRanges::map will take things from global to local >> (either into transcripts or reads, depending on the argument). This takes >> the place of "refLocsToLocalLocs". What "map" needs to support is the >> reverse. I think we could do this with either a new function. I am not >> sure >> if it should be called reverseMap though, because it's not clear which is >> forward and which is reverse. Maybe we need mapToGlobal and mapToLocal? Or >> maybe "absolute" and "relative" are better terms? >> >> Btw, we are working on an "easier to use" interface for the >> transcriptLocsToRefLocs function and that should be integrated with any >> refactoring/renaming. >> > I like the idea of the map generic and where it is going. I think the > mapToGlobal and mapToLocal terms are more clear. Assuming in mapToGlobal > the 'from' would be along the lines of cDNA-based, cds-based, or > protein-based coordinates. In mapToLocal the 'from' would always be > genomic-based coordinates. Yes? > > Yes, that would be the typical use case, although the generic is meant to be more general, i.e., it is in IRanges, not GenomicRanges. > Valerie > > >> Let's get a discussion going. >> >> Michael >> >> >> Thanks for any pointers, >>>> >>>> Cheers, >>>> >>>> Nico >>>> >>>> ------------------------------****----------------------------**- -**--- >>>> >>>> Nicolas Delhomme >>>> >>>> Genome Biology Computational Support >>>> >>>> European Molecular Biology Laboratory >>>> >>>> Tel: +49 6221 387 8310 Email: nicolas.delhomme@embl.de >>>> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany >>>> >>>> ______________________________****_________________ Bioconductor >>>> mailing >>>> list Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https: st="" at.ethz.ch="" mailman="" **listinfo="" bioconductor=""> >>>> <https: **="" stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https:="" s="" tat.ethz.ch="" mailman="" listinfo="" bioconductor="">>Search >>>> the >>>> archives: >>>> http://news.gmane.org/gmane.****science.biology.informatics.**** >>>> conductor<http: news.gmane.org="" gmane.**science.biology.informati="" cs.**conductor=""> >>>> <http: news.gmane.**org="" gmane.science.biology.**informatics.cond="" uctor<http:="" news.gmane.org="" gmane.science.biology.informatics.conducto="" r=""> >>>> > >>>> >>>> >>> -- >>> Computational Biology >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>> >>> Location: M1-B861 >>> Telephone: 206 667-2793 >>> >>> >>> ______________________________****_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https: sta="" t.ethz.ch="" mailman="" **listinfo="" bioconductor=""> >>> <https: **="" stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https:="" st="" at.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> > >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.****conductor<http: news.gmane.**="">>> org/gmane.science.biology.**informatics.conductor<http: news.gman="" e.org="" gmane.science.biology.informatics.conductor=""> >>> > >>> >>> [[alternative HTML version deleted]] >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Michael, That sounds really good! When you talk about refactoring the transcriptLocsToRefLocs function, what do you mean exactly? I didn't find the interface so hard to understand, took me ~5 mins to figure it out. Some error message could be more explicit though, e.g. I got the following when tlocs was a list of numeric vectors instead of a list of integer vectors: Error in .Call2("tlocs2rlocs", tlocs, exonStarts, exonEnds, strand, decreasing.rank.on.minus.strand, : 'tlocs' has invalid elements but that was all really. Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 8 Apr 2012, at 07:45, Michael Lawrence wrote: > On Sat, Apr 7, 2012 at 7:31 PM, Valerie Obenchain <vobencha at="" fhcrc.org="">wrote: > >> On 04/07/12 16:30, Michael Lawrence wrote: >> >>> On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan<mtmorgan at="" fhcrc.org=""> >>> wrote: >>> >>> On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: >>>> >>>> Hi all, >>>>> >>>>> I'm just wondering if there would be a direct way to convert an >>>>> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert >>>>> my IRanges into an integer vector and cast that as an Rle >>>>> (Rle(as.integer(rng)), but that is not extremely efficient on a long >>>>> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an >>>>> sapply. >>>>> >>>>> Why I want that is for the following: I have an IRangesList of >>>>> transcripts (describing exons at the genome level) and for every one, >>>>> I have a bp position at the transcript level that I want to convert >>>>> into a genomic bp position. Basically, I need to be able to convert a >>>>> given transcript coordinate into the corresponding genomic >>>>> coordinate. My IRanges contain the genomic coordinates of every >>>>> transcript and by converting it into an integer vector, I can select >>>>> the right genomic bp coordinate by using the transcript bp coordinate >>>>> as an index (as.integer(rng)[transcript.****pos]). >>>>> >>>>> >>>>> I considered the IRanges approach because I keep the transcript name >>>>> and I'm sure that I looking up the right coord in the right >>>>> transcript, but I'm open to other suggestions. >>>>> >>>>> Hi Nico -- VariantAnnotation::****refLocsToLocalLocs, >>>> GenomicFeatures::****transcriptLocs2refLocs >>>> >>>> and IRanges::map might do this for you; no direct experience on my part, >>>> though. Martin >>>> >>>> >>>> Right. Right now, IRanges::map will take things from global to local >>> (either into transcripts or reads, depending on the argument). This takes >>> the place of "refLocsToLocalLocs". What "map" needs to support is the >>> reverse. I think we could do this with either a new function. I am not >>> sure >>> if it should be called reverseMap though, because it's not clear which is >>> forward and which is reverse. Maybe we need mapToGlobal and mapToLocal? Or >>> maybe "absolute" and "relative" are better terms? >>> >>> Btw, we are working on an "easier to use" interface for the >>> transcriptLocsToRefLocs function and that should be integrated with any >>> refactoring/renaming. >>> >> I like the idea of the map generic and where it is going. I think the >> mapToGlobal and mapToLocal terms are more clear. Assuming in mapToGlobal >> the 'from' would be along the lines of cDNA-based, cds-based, or >> protein-based coordinates. In mapToLocal the 'from' would always be >> genomic-based coordinates. Yes? >> >> > Yes, that would be the typical use case, although the generic is meant to > be more general, i.e., it is in IRanges, not GenomicRanges. > > >> Valerie >> >> >>> Let's get a discussion going. >>> >>> Michael >>> >>> >>> Thanks for any pointers, >>>>> >>>>> Cheers, >>>>> >>>>> Nico >>>>> >>>>> ------------------------------****----------------------------** --**--- >>>>> >>>>> Nicolas Delhomme >>>>> >>>>> Genome Biology Computational Support >>>>> >>>>> European Molecular Biology Laboratory >>>>> >>>>> Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de >>>>> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany >>>>> >>>>> ______________________________****_________________ Bioconductor >>>>> mailing >>>>> list Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https: s="" tat.ethz.ch="" mailman="" **listinfo="" bioconductor=""> >>>>> <https: **="" stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor="">>Search >>>>> the >>>>> archives: >>>>> http://news.gmane.org/gmane.****science.biology.informatics.**** >>>>> conductor<http: news.gmane.org="" gmane.**science.biology.informat="" ics.**conductor=""> >>>>> <http: news.gmane.**org="" gmane.science.biology.**informatics.con="" ductor<http:="" news.gmane.org="" gmane.science.biology.informatics.conduct="" or=""> >>>>>> >>>>> >>>>> >>>> -- >>>> Computational Biology >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: M1-B861 >>>> Telephone: 206 667-2793 >>>> >>>> >>>> ______________________________****_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/****listinfo/bioconductor<https: st="" at.ethz.ch="" mailman="" **listinfo="" bioconductor=""> >>>> <https: **="" stat.ethz.ch="" mailman="" **listinfo="" bioconductor<https:="" s="" tat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>>> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.****conductor<http: news.gmane.**="">>>> org/gmane.science.biology.**informatics.conductor<http: news.gma="" ne.org="" gmane.science.biology.informatics.conductor=""> >>>>> >>>> >>>> [[alternative HTML version deleted]] >>> >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
On 04/07/2012 10:45 PM, Michael Lawrence wrote: > > > On Sat, Apr 7, 2012 at 7:31 PM, Valerie Obenchain <vobencha@fhcrc.org> <mailto:vobencha@fhcrc.org>> wrote: > > On 04/07/12 16:30, Michael Lawrence wrote: > > On Sat, Apr 7, 2012 at 11:12 AM, Martin > Morgan<mtmorgan@fhcrc.org <mailto:mtmorgan@fhcrc.org="">> wrote: > > On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: > > Hi all, > > I'm just wondering if there would be a direct way to > convert an > IRanges to an Rle, as in: as(rng,"Rle"). At the > moment, I can convert > my IRanges into an integer vector and cast that as an Rle > (Rle(as.integer(rng)), but that is not extremely > efficient on a long > IRangesList (with> 700,000 IRanges in it). Takes ~10 > mins with an > sapply. > > Why I want that is for the following: I have an > IRangesList of > transcripts (describing exons at the genome level) and > for every one, > I have a bp position at the transcript level that I > want to convert > into a genomic bp position. Basically, I need to be > able to convert a > given transcript coordinate into the corresponding genomic > coordinate. My IRanges contain the genomic coordinates > of every > transcript and by converting it into an integer > vector, I can select > the right genomic bp coordinate by using the > transcript bp coordinate > as an index (as.integer(rng)[transcript.**pos]). > > > I considered the IRanges approach because I keep the > transcript name > and I'm sure that I looking up the right coord in the > right > transcript, but I'm open to other suggestions. > > Hi Nico -- VariantAnnotation::**refLocsToLocalLocs, > GenomicFeatures::**transcriptLocs2refLocs > > and IRanges::map might do this for you; no direct > experience on my part, > though. Martin > > > Right. Right now, IRanges::map will take things from global to > local > (either into transcripts or reads, depending on the argument). > This takes > the place of "refLocsToLocalLocs". What "map" needs to support > is the > reverse. I think we could do this with either a new function. > I am not sure > if it should be called reverseMap though, because it's not > clear which is > forward and which is reverse. Maybe we need mapToGlobal and > mapToLocal? Or > maybe "absolute" and "relative" are better terms? > > Btw, we are working on an "easier to use" interface for the > transcriptLocsToRefLocs function and that should be integrated > with any > refactoring/renaming. > > I like the idea of the map generic and where it is going. I think > the mapToGlobal and mapToLocal terms are more clear. Assuming in > mapToGlobal the 'from' would be along the lines of cDNA-based, > cds-based, or protein-based coordinates. In mapToLocal the 'from' > would always be genomic-based coordinates. Yes? > > > Yes, that would be the typical use case, although the generic is meant > to be more general, i.e., it is in IRanges, not GenomicRanges. OK. You previously mentioned the map generic could be used to both convert between organisms (human reference-based -> pig reference- based ) and between coordinate systems within the same organism (human reference-based -> human cds-based). At least I think that's what you had in mind. If this is the case, maybe we need an argument that indicates 'sameOrganism = TRUE'? > > Valerie > > > Let's get a discussion going. > > Michael > > > Thanks for any pointers, > > Cheers, > > Nico > > ------------------------------**------------------------------**--- > > > Nicolas Delhomme > > Genome Biology Computational Support > > European Molecular Biology Laboratory > > Tel: +49 6221 387 8310 <tel:%2b49%206221%20387%208310> > Email: nicolas.delhomme@embl.de > <mailto:nicolas.delhomme@embl.de> > Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, > Germany > > ______________________________**_________________ > Bioconductor mailing > list Bioconductor@r-project.org > <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/**listinfo/bioconductor <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor="">Search > the > archives: > http://news.gmane.org/gmane.**science.biology.inform atics.**conductor<http: news.gmane.org="" gmane.science.biology.informat="" ics.conductor=""> > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 <tel:206%20667-2793> > > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/**listinfo/bioconductor<htt ps:="" stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gman="" e.org="" gmane.science.biology.informatics.conductor=""> > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Mon, Apr 9, 2012 at 8:42 AM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > ** > On 04/07/2012 10:45 PM, Michael Lawrence wrote: > > > > On Sat, Apr 7, 2012 at 7:31 PM, Valerie Obenchain <vobencha@fhcrc.org>wrote: > >> On 04/07/12 16:30, Michael Lawrence wrote: >> >>> On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan<mtmorgan@fhcrc.org> >>> wrote: >>> >>> On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: >>>> >>>> Hi all, >>>>> >>>>> I'm just wondering if there would be a direct way to convert an >>>>> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert >>>>> my IRanges into an integer vector and cast that as an Rle >>>>> (Rle(as.integer(rng)), but that is not extremely efficient on a long >>>>> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an >>>>> sapply. >>>>> >>>>> Why I want that is for the following: I have an IRangesList of >>>>> transcripts (describing exons at the genome level) and for every one, >>>>> I have a bp position at the transcript level that I want to convert >>>>> into a genomic bp position. Basically, I need to be able to convert a >>>>> given transcript coordinate into the corresponding genomic >>>>> coordinate. My IRanges contain the genomic coordinates of every >>>>> transcript and by converting it into an integer vector, I can select >>>>> the right genomic bp coordinate by using the transcript bp coordinate >>>>> as an index (as.integer(rng)[transcript.**pos]). >>>>> >>>>> >>>>> I considered the IRanges approach because I keep the transcript name >>>>> and I'm sure that I looking up the right coord in the right >>>>> transcript, but I'm open to other suggestions. >>>>> >>>>> Hi Nico -- VariantAnnotation::**refLocsToLocalLocs, >>>> GenomicFeatures::**transcriptLocs2refLocs >>>> >>>> and IRanges::map might do this for you; no direct experience on my part, >>>> though. Martin >>>> >>>> >>>> Right. Right now, IRanges::map will take things from global to local >>> (either into transcripts or reads, depending on the argument). This takes >>> the place of "refLocsToLocalLocs". What "map" needs to support is the >>> reverse. I think we could do this with either a new function. I am not >>> sure >>> if it should be called reverseMap though, because it's not clear which is >>> forward and which is reverse. Maybe we need mapToGlobal and mapToLocal? >>> Or >>> maybe "absolute" and "relative" are better terms? >>> >>> Btw, we are working on an "easier to use" interface for the >>> transcriptLocsToRefLocs function and that should be integrated with any >>> refactoring/renaming. >>> >> I like the idea of the map generic and where it is going. I think the >> mapToGlobal and mapToLocal terms are more clear. Assuming in mapToGlobal >> the 'from' would be along the lines of cDNA-based, cds-based, or >> protein-based coordinates. In mapToLocal the 'from' would always be >> genomic-based coordinates. Yes? >> >> > Yes, that would be the typical use case, although the generic is meant to > be more general, i.e., it is in IRanges, not GenomicRanges. > > > OK. You previously mentioned the map generic could be used to both convert > between organisms (human reference-based -> pig reference-based ) and > between coordinate systems within the same organism (human reference-based > -> human cds-based). At least I think that's what you had in mind. If this > is the case, maybe we need an argument that indicates 'sameOrganism = TRUE'? > > > I think it would all depend on the alignment that is provided. We could have a Chain method that could go between assemblies, including between species (even though it is not really condoned). The GRangesList method is always going to assume that each element represents the alignment of some sequence (like a refseq) to the same genome build as "from". The compatibility of the genome builds is an easy thing to check, given Seqinfo. > > > >> Valerie >> >> >>> Let's get a discussion going. >>> >>> Michael >>> >>> >>> Thanks for any pointers, >>>>> >>>>> Cheers, >>>>> >>>>> Nico >>>>> >>>>> ------------------------------**------------------------------**--- >>>>> >>>>> Nicolas Delhomme >>>>> >>>>> Genome Biology Computational Support >>>>> >>>>> European Molecular Biology Laboratory >>>>> >>>>> Tel: +49 6221 387 8310 <%2B49%206221%20387%208310> Email: >>>>> nicolas.delhomme@embl.de >>>>> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany >>>>> >>>>> ______________________________**_________________ Bioconductor mailing >>>>> list Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor>Search the >>>>> archives: >>>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor< >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>>> >>>>> >>>> -- >>>> Computational Biology >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 >>>> >>>> Location: M1-B861 >>>> Telephone: 206 667-2793 >>>> >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor> >>>> Search the archives: http://news.gmane.org/gmane.** >>>> science.biology.informatics.**conductor< >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor> >>>> >>>> [[alternative HTML version deleted]] >>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
> On Mon, Apr 9, 2012 at 8:42 AM, Valerie Obenchain > <vobencha at="" fhcrc.org="">wrote: > > > ** > > On 04/07/2012 10:45 PM, Michael Lawrence wrote: > > > > > > > > On Sat, Apr 7, 2012 at 7:31 PM, Valerie Obenchain > <vobencha at="" fhcrc.org="">wrote: > > > >> On 04/07/12 16:30, Michael Lawrence wrote: > >> > >>> On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan<mtmorgan at="" fhcrc.org=""> > >>> wrote: > >>> > >>> On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: > >>>> > >>>> Hi all, > >>>>> > >>>>> I'm just wondering if there would be a direct way to convert an > >>>>> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert > >>>>> my IRanges into an integer vector and cast that as an Rle > >>>>> (Rle(as.integer(rng)), but that is not extremely efficient on a long > >>>>> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an > >>>>> sapply. > >>>>> > >>>>> Why I want that is for the following: I have an IRangesList of > >>>>> transcripts (describing exons at the genome level) and for every one, > >>>>> I have a bp position at the transcript level that I want to convert > >>>>> into a genomic bp position. Basically, I need to be able to convert a > >>>>> given transcript coordinate into the corresponding genomic > >>>>> coordinate. My IRanges contain the genomic coordinates of every > >>>>> transcript and by converting it into an integer vector, I can select > >>>>> the right genomic bp coordinate by using the transcript bp coordinate > >>>>> as an index (as.integer(rng)[transcript.**pos]). > >>>>> > >>>>> > >>>>> I considered the IRanges approach because I keep the transcript > name > >>>>> and I'm sure that I looking up the right coord in the right > >>>>> transcript, but I'm open to other suggestions. > >>>>> > >>>>> Hi Nico -- VariantAnnotation::**refLocsToLocalLocs, > >>>> GenomicFeatures::**transcriptLocs2refLocs > >>>> > >>>> and IRanges::map might do this for you; no direct experience on my > part, > >>>> though. Martin > >>>> > >>>> > >>>> Right. Right now, IRanges::map will take things from global to local > >>> (either into transcripts or reads, depending on the argument). This takes > >>> the place of "refLocsToLocalLocs". What "map" needs to support is the > >>> reverse. I think we could do this with either a new function. I am not > >>> sure > >>> if it should be called reverseMap though, because it's not clear which is > >>> forward and which is reverse. Maybe we need mapToGlobal and > mapToLocal? > >>> Or > >>> maybe "absolute" and "relative" are better terms? > >>> > >>> Btw, we are working on an "easier to use" interface for the > >>> transcriptLocsToRefLocs function and that should be integrated with any > >>> refactoring/renaming. > >>> > >> I like the idea of the map generic and where it is going. I think the > >> mapToGlobal and mapToLocal terms are more clear. Assuming in > mapToGlobal > >> the 'from' would be along the lines of cDNA-based, cds-based, or > >> protein-based coordinates. In mapToLocal the 'from' would always be > >> genomic-based coordinates. Yes? > >> > >> > > Yes, that would be the typical use case, although the generic is meant to > > be more general, i.e., it is in IRanges, not GenomicRanges. > > > > > > OK. You previously mentioned the map generic could be used to both > convert > > between organisms (human reference-based -> pig reference-based ) and > > between coordinate systems within the same organism (human reference- > based > > -> human cds-based). At least I think that's what you had in mind. If this > > is the case, maybe we need an argument that indicates 'sameOrganism = > TRUE'? > > > > > > > I think it would all depend on the alignment that is provided. We could > have a Chain method maybe Reduce ? --Malcolm > that could go between assemblies, including between > species (even though it is not really condoned). The GRangesList method is > always going to assume that each element represents the alignment of some > sequence (like a refseq) to the same genome build as "from". The > compatibility of the genome builds is an easy thing to check, given Seqinfo. > >
ADD REPLY
0
Entering edit mode
On Mon, Apr 9, 2012 at 10:34 AM, Cook, Malcolm <mec@stowers.org> wrote: > > > On Mon, Apr 9, 2012 at 8:42 AM, Valerie Obenchain > > <vobencha@fhcrc.org>wrote: > > > > > ** > > > On 04/07/2012 10:45 PM, Michael Lawrence wrote: > > > > > > > > > > > > On Sat, Apr 7, 2012 at 7:31 PM, Valerie Obenchain > > <vobencha@fhcrc.org>wrote: > > > > > >> On 04/07/12 16:30, Michael Lawrence wrote: > > >> > > >>> On Sat, Apr 7, 2012 at 11:12 AM, Martin Morgan<mtmorgan@fhcrc.org> > > >>> wrote: > > >>> > > >>> On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: > > >>>> > > >>>> Hi all, > > >>>>> > > >>>>> I'm just wondering if there would be a direct way to convert an > > >>>>> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can > convert > > >>>>> my IRanges into an integer vector and cast that as an Rle > > >>>>> (Rle(as.integer(rng)), but that is not extremely efficient on a > long > > >>>>> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an > > >>>>> sapply. > > >>>>> > > >>>>> Why I want that is for the following: I have an IRangesList of > > >>>>> transcripts (describing exons at the genome level) and for every > one, > > >>>>> I have a bp position at the transcript level that I want to convert > > >>>>> into a genomic bp position. Basically, I need to be able to > convert a > > >>>>> given transcript coordinate into the corresponding genomic > > >>>>> coordinate. My IRanges contain the genomic coordinates of every > > >>>>> transcript and by converting it into an integer vector, I can > select > > >>>>> the right genomic bp coordinate by using the transcript bp > coordinate > > >>>>> as an index (as.integer(rng)[transcript.**pos]). > > >>>>> > > >>>>> > > >>>>> I considered the IRanges approach because I keep the transcript > > name > > >>>>> and I'm sure that I looking up the right coord in the right > > >>>>> transcript, but I'm open to other suggestions. > > >>>>> > > >>>>> Hi Nico -- VariantAnnotation::**refLocsToLocalLocs, > > >>>> GenomicFeatures::**transcriptLocs2refLocs > > >>>> > > >>>> and IRanges::map might do this for you; no direct experience on my > > part, > > >>>> though. Martin > > >>>> > > >>>> > > >>>> Right. Right now, IRanges::map will take things from global to > local > > >>> (either into transcripts or reads, depending on the argument). This > takes > > >>> the place of "refLocsToLocalLocs". What "map" needs to support is the > > >>> reverse. I think we could do this with either a new function. I am > not > > >>> sure > > >>> if it should be called reverseMap though, because it's not clear > which is > > >>> forward and which is reverse. Maybe we need mapToGlobal and > > mapToLocal? > > >>> Or > > >>> maybe "absolute" and "relative" are better terms? > > >>> > > >>> Btw, we are working on an "easier to use" interface for the > > >>> transcriptLocsToRefLocs function and that should be integrated with > any > > >>> refactoring/renaming. > > >>> > > >> I like the idea of the map generic and where it is going. I think the > > >> mapToGlobal and mapToLocal terms are more clear. Assuming in > > mapToGlobal > > >> the 'from' would be along the lines of cDNA-based, cds-based, or > > >> protein-based coordinates. In mapToLocal the 'from' would always be > > >> genomic-based coordinates. Yes? > > >> > > >> > > > Yes, that would be the typical use case, although the generic is meant > to > > > be more general, i.e., it is in IRanges, not GenomicRanges. > > > > > > > > > OK. You previously mentioned the map generic could be used to both > > convert > > > between organisms (human reference-based -> pig reference-based ) and > > > between coordinate systems within the same organism (human reference- > > based > > > -> human cds-based). At least I think that's what you had in mind. If > this > > > is the case, maybe we need an argument that indicates 'sameOrganism = > > TRUE'? > > > > > > > > > > > I think it would all depend on the alignment that is provided. We could > > have a Chain method > > maybe Reduce ? > > Sorry, what I meant is that we could have a map method for Chain objects, which represent UCSC Chain files. > --Malcolm > > > that could go between assemblies, including between > > species (even though it is not really condoned). The GRangesList method > is > > always going to assume that each element represents the alignment of some > > sequence (like a refseq) to the same genome build as "from". The > > compatibility of the genome builds is an easy thing to check, given > Seqinfo. > > > > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Martin, I had to change my data structure, but using transcriptLocs2refLocs, computation time is down to one second!!! Amazing, really. Thanks!! Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 7 Apr 2012, at 20:12, Martin Morgan wrote: > On 04/07/2012 05:39 AM, Nicolas Delhomme wrote: >> Hi all, >> >> I'm just wondering if there would be a direct way to convert an >> IRanges to an Rle, as in: as(rng,"Rle"). At the moment, I can convert >> my IRanges into an integer vector and cast that as an Rle >> (Rle(as.integer(rng)), but that is not extremely efficient on a long >> IRangesList (with> 700,000 IRanges in it). Takes ~10 mins with an >> sapply. >> >> Why I want that is for the following: I have an IRangesList of >> transcripts (describing exons at the genome level) and for every one, >> I have a bp position at the transcript level that I want to convert >> into a genomic bp position. Basically, I need to be able to convert a >> given transcript coordinate into the corresponding genomic >> coordinate. My IRanges contain the genomic coordinates of every >> transcript and by converting it into an integer vector, I can select >> the right genomic bp coordinate by using the transcript bp coordinate >> as an index (as.integer(rng)[transcript.pos]). >> >> I considered the IRanges approach because I keep the transcript name >> and I'm sure that I looking up the right coord in the right >> transcript, but I'm open to other suggestions. > > Hi Nico -- VariantAnnotation::refLocsToLocalLocs, GenomicFeatures::transcriptLocs2refLocs and IRanges::map might do this for you; no direct experience on my part, though. Martin > >> >> Thanks for any pointers, >> >> Cheers, >> >> Nico >> >> --------------------------------------------------------------- >> Nicolas Delhomme >> >> Genome Biology Computational Support >> >> European Molecular Biology Laboratory >> >> Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de >> Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany >> >> _______________________________________________ Bioconductor mailing >> list Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the >> archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793
ADD REPLY

Login before adding your answer.

Traffic: 725 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6