rtracklayer::liftOver ordering
2
0
Entering edit mode
Andrew Jaffe ▴ 120
@andrew-jaffe-4820
Last seen 10.1 years ago
I'm having a problem maintaining the ordering of?my GRanges object when I lift it over using rtracklayer::liftOver. For example: > g # my regions GRanges with 5 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] chr19 [ 13130686, 13133039] * | [2] chr4 [160026138, 160028079] * | [3] chr12 [ 65671230, 65672140] * | [4] chr8 [ 19615409, 19616461] * | [5] chr14 [ 99706752, 99708661] * | > chain = import.chain("hg19ToHg18.over.chain") # from UCSC > lifted = liftOver(g, chain) # suppressed unmatched chrs > lifted GRanges with 5 ranges and 0 elementMetadata values seqnames ranges strand | <rle> <iranges> <rle> | [1] chr4 [160245588, 160247529] * | [2] chr8 [ 19659689, 19660741] * | [3] chr12 [ 63957497, 63958407] * | [4] chr14 [ 98776505, 98778414] * | [5] chr19 [ 12991686, 12994039] * | This is just a toy example with 5 regions all on different chromosomes, but with real data where there are multiple regions per chromosome, I am unable to determine the resulting matched lifted data for a particular region. Is there any way to preserve the ordering of my original list in the liftOver output? Presorting by chromosome and position might work 99% of time, but the ordering of some regions might shift during the liftOver, and I would not be able to tell if this occurred. Thanks a lot, Andrew Jaffe
• 2.1k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.8 years ago
United States
On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe <ajaffe@jhsph.edu> wrote: > I'm having a problem maintaining the ordering of my GRanges object > when I lift it over using rtracklayer::liftOver. For example: > > > g # my regions > GRanges with 5 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] chr19 [ 13130686, 13133039] * | > [2] chr4 [160026138, 160028079] * | > [3] chr12 [ 65671230, 65672140] * | > [4] chr8 [ 19615409, 19616461] * | > [5] chr14 [ 99706752, 99708661] * | > > > chain = import.chain("hg19ToHg18.over.chain") # from UCSC > > lifted = liftOver(g, chain) # suppressed unmatched chrs > > lifted > GRanges with 5 ranges and 0 elementMetadata values > seqnames ranges strand | > <rle> <iranges> <rle> | > [1] chr4 [160245588, 160247529] * | > [2] chr8 [ 19659689, 19660741] * | > [3] chr12 [ 63957497, 63958407] * | > [4] chr14 [ 98776505, 98778414] * | > [5] chr19 [ 12991686, 12994039] * | > > This is just a toy example with 5 regions all on different > chromosomes, but with real data where there are multiple regions per > chromosome, I am unable to determine the resulting matched lifted data > for a particular region. Is there any way to preserve the ordering of > my original list in the liftOver output? Presorting by chromosome and > position might work 99% of time, but the ordering of some regions > might shift during the liftOver, and I would not be able to tell if > this occurred. > > I think Kasper's suggestion of an ID column is a good one. The basic problem is that there is not necessarily a 1-1 correspondence after lift-over. A single region in say human could be broken up into multiple regions in mouse. Michael Thanks a lot, > Andrew Jaffe > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi there, On 11-08-24 10:48 AM, Michael Lawrence wrote: > On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe<ajaffe at="" jhsph.edu=""> wrote: > >> I'm having a problem maintaining the ordering of my GRanges object >> when I lift it over using rtracklayer::liftOver. For example: >> >>> g # my regions >> GRanges with 5 ranges and 0 elementMetadata values >> seqnames ranges strand | >> <rle> <iranges> <rle> | >> [1] chr19 [ 13130686, 13133039] * | >> [2] chr4 [160026138, 160028079] * | >> [3] chr12 [ 65671230, 65672140] * | >> [4] chr8 [ 19615409, 19616461] * | >> [5] chr14 [ 99706752, 99708661] * | >> >>> chain = import.chain("hg19ToHg18.over.chain") # from UCSC >>> lifted = liftOver(g, chain) # suppressed unmatched chrs >>> lifted >> GRanges with 5 ranges and 0 elementMetadata values >> seqnames ranges strand | >> <rle> <iranges> <rle> | >> [1] chr4 [160245588, 160247529] * | >> [2] chr8 [ 19659689, 19660741] * | >> [3] chr12 [ 63957497, 63958407] * | >> [4] chr14 [ 98776505, 98778414] * | >> [5] chr19 [ 12991686, 12994039] * | >> >> This is just a toy example with 5 regions all on different >> chromosomes, but with real data where there are multiple regions per >> chromosome, I am unable to determine the resulting matched lifted data >> for a particular region. Is there any way to preserve the ordering of >> my original list in the liftOver output? Presorting by chromosome and >> position might work 99% of time, but the ordering of some regions >> might shift during the liftOver, and I would not be able to tell if >> this occurred. >> >> > I think Kasper's suggestion of an ID column is a good one. The basic problem > is that there is not necessarily a 1-1 correspondence after lift- over. A > single region in say human could be broken up into multiple regions in > mouse. An alternative would be that liftOver() returns a GRangesList instead of GRanges. People who don't care about the exact mapping between the input and the output could always do 'unlist(liftOver(g, chain))' and get what they are getting right now. H. > > Michael > > Thanks a lot, >> Andrew Jaffe >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
That's a good idea. I can make that change. Michael 2011/8/24 Hervé Pagès <hpages@fhcrc.org> > Hi there, > > > On 11-08-24 10:48 AM, Michael Lawrence wrote: > >> On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe<ajaffe@jhsph.edu> wrote: >> >> I'm having a problem maintaining the ordering of my GRanges object >>> when I lift it over using rtracklayer::liftOver. For example: >>> >>> g # my regions >>>> >>> GRanges with 5 ranges and 0 elementMetadata values >>> seqnames ranges strand | >>> <rle> <iranges> <rle> | >>> [1] chr19 [ 13130686, 13133039] * | >>> [2] chr4 [160026138, 160028079] * | >>> [3] chr12 [ 65671230, 65672140] * | >>> [4] chr8 [ 19615409, 19616461] * | >>> [5] chr14 [ 99706752, 99708661] * | >>> >>> chain = import.chain("hg19ToHg18.over.**chain") # from UCSC >>>> lifted = liftOver(g, chain) # suppressed unmatched chrs >>>> lifted >>>> >>> GRanges with 5 ranges and 0 elementMetadata values >>> seqnames ranges strand | >>> <rle> <iranges> <rle> | >>> [1] chr4 [160245588, 160247529] * | >>> [2] chr8 [ 19659689, 19660741] * | >>> [3] chr12 [ 63957497, 63958407] * | >>> [4] chr14 [ 98776505, 98778414] * | >>> [5] chr19 [ 12991686, 12994039] * | >>> >>> This is just a toy example with 5 regions all on different >>> chromosomes, but with real data where there are multiple regions per >>> chromosome, I am unable to determine the resulting matched lifted data >>> for a particular region. Is there any way to preserve the ordering of >>> my original list in the liftOver output? Presorting by chromosome and >>> position might work 99% of time, but the ordering of some regions >>> might shift during the liftOver, and I would not be able to tell if >>> this occurred. >>> >>> >>> I think Kasper's suggestion of an ID column is a good one. The basic >> problem >> is that there is not necessarily a 1-1 correspondence after lift- over. A >> single region in say human could be broken up into multiple regions in >> mouse. >> > > An alternative would be that liftOver() returns a GRangesList instead > of GRanges. People who don't care about the exact mapping between > the input and the output could always do 'unlist(liftOver(g, chain))' > and get what they are getting right now. > > H. > > >> Michael >> >> Thanks a lot, >> >>> Andrew Jaffe >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: >>> http://news.gmane.org/gmane.**science.biology.informatics.**conduc tor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>> >>> >> [[alternative HTML version deleted]] >> >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
How efficient would this be? I sometimes use liftOver on millions of regions. Kasper 2011/8/24 Michael Lawrence <lawrence.michael at="" gene.com="">: > That's a good idea. I can make that change. > > Michael > > 2011/8/24 Hervé Pagès <hpages at="" fhcrc.org=""> > >> Hi there, >> >> >> On 11-08-24 10:48 AM, Michael Lawrence wrote: >> >>> On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe<ajaffe at="" jhsph.edu=""> ?wrote: >>> >>> ?I'm having a problem maintaining the ordering of my GRanges object >>>> when I lift it over using rtracklayer::liftOver. For example: >>>> >>>> ?g # my regions >>>>> >>>> GRanges with 5 ranges and 0 elementMetadata values >>>> ? ?seqnames ? ? ? ? ? ? ? ? ranges strand | >>>> ? ? ? <rle> ? ? ? ? ? ? ? <iranges> ? <rle> ?| >>>> [1] ? ?chr19 [ 13130686, ?13133039] ? ? ?* | >>>> [2] ? ? chr4 [160026138, 160028079] ? ? ?* | >>>> [3] ? ?chr12 [ 65671230, ?65672140] ? ? ?* | >>>> [4] ? ? chr8 [ 19615409, ?19616461] ? ? ?* | >>>> [5] ? ?chr14 [ 99706752, ?99708661] ? ? ?* | >>>> >>>> ?chain = import.chain("hg19ToHg18.over.**chain") # from UCSC >>>>> lifted = liftOver(g, chain) # suppressed unmatched chrs >>>>> lifted >>>>> >>>> GRanges with 5 ranges and 0 elementMetadata values >>>> ? ?seqnames ? ? ? ? ? ? ? ? ranges strand | >>>> ? ? ? <rle> ? ? ? ? ? ? ? <iranges> ? <rle> ?| >>>> [1] ? ? chr4 [160245588, 160247529] ? ? ?* | >>>> [2] ? ? chr8 [ 19659689, ?19660741] ? ? ?* | >>>> [3] ? ?chr12 [ 63957497, ?63958407] ? ? ?* | >>>> [4] ? ?chr14 [ 98776505, ?98778414] ? ? ?* | >>>> [5] ? ?chr19 [ 12991686, ?12994039] ? ? ?* | >>>> >>>> This is just a toy example with 5 regions all on different >>>> chromosomes, but with real data where there are multiple regions per >>>> chromosome, I am unable to determine the resulting matched lifted data >>>> for a particular region. Is there any way to preserve the ordering of >>>> my original list in the liftOver output? Presorting by chromosome and >>>> position might work 99% of time, but the ordering of some regions >>>> might shift during the liftOver, and I would not be able to tell if >>>> this occurred. >>>> >>>> >>>> ?I think Kasper's suggestion of an ID column is a good one. The basic >>> problem >>> is that there is not necessarily a 1-1 correspondence after lift- over. A >>> single region in say human could be broken up into multiple regions in >>> mouse. >>> >> >> An alternative would be that liftOver() returns a GRangesList instead >> of GRanges. People who don't care about the exact mapping between >> the input and the output could always do 'unlist(liftOver(g, chain))' >> and get what they are getting right now. >> >> H. >> >> >>> Michael >>> >>> Thanks a lot, >>> >>>> Andrew Jaffe >>>> >>>> ______________________________**_________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat="" .ethz.ch="" mailman="" listinfo="" bioconductor=""> >>>> Search the archives: >>>> http://news.gmane.org/gmane.**science.biology.informatics.**condu ctor<http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >>>> >>>> >>> ? ? ? ?[[alternative HTML version deleted]] >>> >>> >>> ______________________________**_________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.="" ethz.ch="" mailman="" listinfo="" bioconductor=""> >>> Search the archives: http://news.gmane.org/gmane.** >>> science.biology.informatics.**conductor<http: news.gmane.org="" gman="" e.science.biology.informatics.conductor=""> >>> >> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: ?(206) 667-5791 >> Fax: ? ?(206) 667-1319 >> > > ? ? ? ?[[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
On Wed, Aug 24, 2011 at 6:42 PM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > How efficient would this be? I sometimes use liftOver on millions of > regions. > > Performance should not be a concern. The GRangesList is compressed, so no splitting and unsplitting actually occur. In my tests, the current liftOver() is much faster than the UCSC tool, but doesn't have the licensing issues. Michael Kasper > > 2011/8/24 Michael Lawrence <lawrence.michael@gene.com>: > > That's a good idea. I can make that change. > > > > Michael > > > > 2011/8/24 Hervé Pagès <hpages@fhcrc.org> > > > >> Hi there, > >> > >> > >> On 11-08-24 10:48 AM, Michael Lawrence wrote: > >> > >>> On Wed, Aug 24, 2011 at 8:28 AM, Andrew Jaffe<ajaffe@jhsph.edu> > wrote: > >>> > >>> I'm having a problem maintaining the ordering of my GRanges object > >>>> when I lift it over using rtracklayer::liftOver. For example: > >>>> > >>>> g # my regions > >>>>> > >>>> GRanges with 5 ranges and 0 elementMetadata values > >>>> seqnames ranges strand | > >>>> <rle> <iranges> <rle> | > >>>> [1] chr19 [ 13130686, 13133039] * | > >>>> [2] chr4 [160026138, 160028079] * | > >>>> [3] chr12 [ 65671230, 65672140] * | > >>>> [4] chr8 [ 19615409, 19616461] * | > >>>> [5] chr14 [ 99706752, 99708661] * | > >>>> > >>>> chain = import.chain("hg19ToHg18.over.**chain") # from UCSC > >>>>> lifted = liftOver(g, chain) # suppressed unmatched chrs > >>>>> lifted > >>>>> > >>>> GRanges with 5 ranges and 0 elementMetadata values > >>>> seqnames ranges strand | > >>>> <rle> <iranges> <rle> | > >>>> [1] chr4 [160245588, 160247529] * | > >>>> [2] chr8 [ 19659689, 19660741] * | > >>>> [3] chr12 [ 63957497, 63958407] * | > >>>> [4] chr14 [ 98776505, 98778414] * | > >>>> [5] chr19 [ 12991686, 12994039] * | > >>>> > >>>> This is just a toy example with 5 regions all on different > >>>> chromosomes, but with real data where there are multiple regions per > >>>> chromosome, I am unable to determine the resulting matched lifted data > >>>> for a particular region. Is there any way to preserve the ordering of > >>>> my original list in the liftOver output? Presorting by chromosome and > >>>> position might work 99% of time, but the ordering of some regions > >>>> might shift during the liftOver, and I would not be able to tell if > >>>> this occurred. > >>>> > >>>> > >>>> I think Kasper's suggestion of an ID column is a good one. The basic > >>> problem > >>> is that there is not necessarily a 1-1 correspondence after lift-over. > A > >>> single region in say human could be broken up into multiple regions in > >>> mouse. > >>> > >> > >> An alternative would be that liftOver() returns a GRangesList instead > >> of GRanges. People who don't care about the exact mapping between > >> the input and the output could always do 'unlist(liftOver(g, chain))' > >> and get what they are getting right now. > >> > >> H. > >> > >> > >>> Michael > >>> > >>> Thanks a lot, > >>> > >>>> Andrew Jaffe > >>>> > >>>> ______________________________**_________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@r-project.org > >>>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< > https://stat.ethz.ch/mailman/listinfo/bioconductor> > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.**science.biology.informatics.**conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > >>>> > >>>> > >>> [[alternative HTML version deleted]] > >>> > >>> > >>> ______________________________**_________________ > >>> Bioconductor mailing list > >>> Bioconductor@r-project.org > >>> https://stat.ethz.ch/mailman/**listinfo/bioconductor< > https://stat.ethz.ch/mailman/listinfo/bioconductor> > >>> Search the archives: http://news.gmane.org/gmane.** > >>> science.biology.informatics.**conductor< > http://news.gmane.org/gmane.science.biology.informatics.conductor> > >>> > >> > >> > >> -- > >> Hervé Pagès > >> > >> Program in Computational Biology > >> Division of Public Health Sciences > >> Fred Hutchinson Cancer Research Center > >> 1100 Fairview Ave. N, M1-B514 > >> P.O. Box 19024 > >> Seattle, WA 98109-1024 > >> > >> E-mail: hpages@fhcrc.org > >> Phone: (206) 667-5791 > >> Fax: (206) 667-1319 > >> > > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
@kasper-daniel-hansen-2979
Last seen 15 months ago
United States
I have not used the liftOver tool from rtracklayer and I am happy to see it exists. What I have been doing using the command line tool and a wrapper script from R is to add a "name" column that is really the row number of the original region. Then (the command line version) outputs this as a name column in the text file. This allows me to go back and see how they are matched up 1-1. I find this indispensable. Andrew: you might be able to fake this by add names to your initial GRanges object or by adding a metadata column. Kasper On Wed, Aug 24, 2011 at 11:28 AM, Andrew Jaffe <ajaffe at="" jhsph.edu=""> wrote: > I'm having a problem maintaining the ordering of?my GRanges object > when I lift it over using rtracklayer::liftOver. For example: > >> g # my regions > GRanges with 5 ranges and 0 elementMetadata values > ? ?seqnames ? ? ? ? ? ? ? ? ranges strand | > ? ? ? <rle> ? ? ? ? ? ? ?<iranges> ?<rle> | > [1] ? ?chr19 [ 13130686, ?13133039] ? ? ?* | > [2] ? ? chr4 [160026138, 160028079] ? ? ?* | > [3] ? ?chr12 [ 65671230, ?65672140] ? ? ?* | > [4] ? ? chr8 [ 19615409, ?19616461] ? ? ?* | > [5] ? ?chr14 [ 99706752, ?99708661] ? ? ?* | > >> chain = import.chain("hg19ToHg18.over.chain") # from UCSC >> lifted = liftOver(g, chain) # suppressed unmatched chrs >> lifted > GRanges with 5 ranges and 0 elementMetadata values > ? ?seqnames ? ? ? ? ? ? ? ? ranges strand | > ? ? ? <rle> ? ? ? ? ? ? ?<iranges> ?<rle> | > [1] ? ? chr4 [160245588, 160247529] ? ? ?* | > [2] ? ? chr8 [ 19659689, ?19660741] ? ? ?* | > [3] ? ?chr12 [ 63957497, ?63958407] ? ? ?* | > [4] ? ?chr14 [ 98776505, ?98778414] ? ? ?* | > [5] ? ?chr19 [ 12991686, ?12994039] ? ? ?* | > > This is just a toy example with 5 regions all on different > chromosomes, but with real data where there are multiple regions per > chromosome, I am unable to determine the resulting matched lifted data > for a particular region. Is there any way to preserve the ordering of > my original list in the liftOver output? Presorting by chromosome and > position might work 99% of time, but the ordering of some regions > might shift during the liftOver, and I would not be able to tell if > this occurred. > > Thanks a lot, > Andrew Jaffe > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD COMMENT

Login before adding your answer.

Traffic: 432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6