Computing the coverage on a bamfile with a huge amount of seqlevels

0

Entering edit mode

Stefanie ▴ 360

@stefanie-5192

Last seen 10.7 years ago

What is the best way to compute the coverage on a ReadGAlignments object when I have many, many (about 50 000) seqlevels? Just computing coverage(object) is terribly slow due to the huge amount of seqlevels. At the moment, I am subsetting the ReadGAlignments object for each seqlevel, I omit the redundant seqlevels and calculate the coverage. This is fast if you do it for a few seqlevels. But if one really would like to compute the coverage for each seqlevel individually, this really takes a long time.. I would appreciate any comments! Best, Stefanie

Coverage Coverage • 1.7k views

ADD COMMENT • link updated 11.5 years ago by Hervé Pagès 16k • written 11.5 years ago by Stefanie ▴ 360

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 7 hours ago

Seattle, WA, United States

Hi Stephanie, On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > What is the best way to compute the coverage on a ReadGAlignments object > when I have many, many (about 50 000) seqlevels? > > Just computing coverage(object) is terribly slow due to the huge amount of seqlevels. > > At the moment, I am subsetting the ReadGAlignments object for each seqlevel, > I omit the redundant seqlevels and calculate the coverage. > This is fast if you do it for a few seqlevels. > But if one really would like to compute the coverage for each seqlevel individually, > this really takes a long time.. Yes this is a known weakness in the current implementation of coverage(). It loops over the seqlevels and this loop is currently performed in R, which is slow. Unfortunately this is not the first time someone complains about this. Time for us to bite the bullet I guess. I've just started to work on moving the loop to the C-level. That should speed up things significantly. I'll let you know when this is ready. Thanks for the feedback. H. > > > I would appreciate any comments! > > Best, > Stefanie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 11.5 years ago Hervé Pagès 16k

0

Entering edit mode

Thanks for the answer! Just wanted to make sure that I don't miss a relevant feature . Best, Stefanie Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org>: > Hi Stephanie, > > On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >> What is the best way to compute the coverage on a ReadGAlignments object >> when I have many, many (about 50 000) seqlevels? >> >> Just computing coverage(object) is terribly slow due to the huge amount of seqlevels. >> >> At the moment, I am subsetting the ReadGAlignments object for each seqlevel, >> I omit the redundant seqlevels and calculate the coverage. >> This is fast if you do it for a few seqlevels. >> But if one really would like to compute the coverage for each seqlevel individually, >> this really takes a long time.. > > Yes this is a known weakness in the current implementation of > coverage(). It loops over the seqlevels and this loop is currently > performed in R, which is slow. Unfortunately this is not the first > time someone complains about this. > > Time for us to bite the bullet I guess. I've just started to work > on moving the loop to the C-level. That should speed up things > significantly. I'll let you know when this is ready. > > Thanks for the feedback. > H. > >> >> >> I would appreciate any comments! >> >> Best, >> Stefanie >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Stefanie ▴ 360

0

Entering edit mode

Hi Stephanie, Just to let you know that in GenomicRanges 1.14.2, coverage() should now be much faster on objects with many seqlevels. Should be about 1000x faster or more if you have 50 000 seqlevels. The only case where it's still slow is if many seqlevels are marked as circular (and have a non-NA seqlength) in seqinfo(x), because for these seqlevels the coverage vector is still post processed by a loop in R. Could be a problem if for example people have BAM files with reads aligned to thousands of bacterial genomes. GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru biocLite() in the next 24 hours or so. The man pages for the "coverage" methods have been improved, especially in IRanges. Cheers, H. On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > Thanks for the answer! > Just wanted to make sure that I don't miss a relevant feature?. > > Best, > Stefanie > > Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>: > >> Hi Stephanie, >> >> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >>> What is the best way to compute the coverage on a ReadGAlignments object >>> when I have many, many (about 50 000) seqlevels? >>> >>> Just computing coverage(object) is terribly slow due to the huge >>> amount of seqlevels. >>> >>> At the moment, I am subsetting the ReadGAlignments object for each >>> seqlevel, >>> I omit the redundant seqlevels and calculate the coverage. >>> This is fast if you do it for a few seqlevels. >>> But if one really would like to compute the coverage for each >>> seqlevel individually, >>> this really takes a long time.. >> >> Yes this is a known weakness in the current implementation of >> coverage(). It loops over the seqlevels and this loop is currently >> performed in R, which is slow. Unfortunately this is not the first >> time someone complains about this. >> >> Time for us to bite the bullet I guess. I've just started to work >> on moving the loop to the C-level. That should speed up things >> significantly. I'll let you know when this is ready. >> >> Thanks for the feedback. >> H. >> >>> >>> >>> I would appreciate any comments! >>> >>> Best, >>> Stefanie >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 11.5 years ago Hervé Pagès 16k

0

Entering edit mode

Thanks a lot! Can?t wait to try the new version! Best, Stefanie Am 25.10.2013 um 12:44 schrieb Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Stephanie, > > Just to let you know that in GenomicRanges 1.14.2, coverage() should > now be much faster on objects with many seqlevels. Should be about > 1000x faster or more if you have 50 000 seqlevels. > > The only case where it's still slow is if many seqlevels are marked as > circular (and have a non-NA seqlength) in seqinfo(x), because for > these seqlevels the coverage vector is still post processed by a > loop in R. Could be a problem if for example people have BAM files > with reads aligned to thousands of bacterial genomes. > > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > biocLite() in the next 24 hours or so. The man pages for the "coverage" > methods have been improved, especially in IRanges. > > Cheers, > H. > > > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> Thanks for the answer! >> Just wanted to make sure that I don't miss a relevant feature?. >> >> Best, >> Stefanie >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org="">> <mailto:hpages at="" fhcrc.org="">>: >> >>> Hi Stephanie, >>> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >>>> What is the best way to compute the coverage on a ReadGAlignments object >>>> when I have many, many (about 50 000) seqlevels? >>>> >>>> Just computing coverage(object) is terribly slow due to the huge >>>> amount of seqlevels. >>>> >>>> At the moment, I am subsetting the ReadGAlignments object for each >>>> seqlevel, >>>> I omit the redundant seqlevels and calculate the coverage. >>>> This is fast if you do it for a few seqlevels. >>>> But if one really would like to compute the coverage for each >>>> seqlevel individually, >>>> this really takes a long time.. >>> >>> Yes this is a known weakness in the current implementation of >>> coverage(). It loops over the seqlevels and this loop is currently >>> performed in R, which is slow. Unfortunately this is not the first >>> time someone complains about this. >>> >>> Time for us to bite the bullet I guess. I've just started to work >>> on moving the loop to the C-level. That should speed up things >>> significantly. I'll let you know when this is ready. >>> >>> Thanks for the feedback. >>> H. >>> >>>> >>>> >>>> I would appreciate any comments! >>>> >>>> Best, >>>> Stefanie >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319

ADD REPLY • link 11.5 years ago Stefanie ▴ 360

0

Entering edit mode

Hi Herv?, I just wanted to try the new coverage function: # I read in a bam file, human data mapped against the transcriptome x = readGAlignments(human_trans_bam) # This is my GAligments object > x GAlignments with 46765032 alignments and 0 metadata columns: seqnames strand cigar qwidth <rle> <rle> <character> <integer> [1] hg19_ensGene_ENST00000237247 + 75M 75 [2] hg19_ensGene_ENST00000371036 - 75M 75 [3] hg19_ensGene_ENST00000371037 - 75M 75 [4] hg19_ensGene_ENST00000471889 - 75M 75 [5] hg19_ensGene_ENST00000377479 + 75M 75 ... ... ... ... ... [46765028] hg19_ensGene_ENST00000344867 - 75M 75 [46765029] hg19_ensGene_ENST00000400848 - 75M 75 [46765030] hg19_ensGene_ENST00000400848 - 75M 75 [46765031] hg19_ensGene_ENST00000400848 - 75M 75 [46765032] hg19_ensGene_ENST00000400829 - 75M 75 start end width ngap <integer> <integer> <integer> <integer> [1] 1783 1857 75 0 [2] 690 764 75 0 [3] 1632 1706 75 0 [4] 2121 2195 75 0 [5] 391 465 75 0 ... ... ... ... ... [46765028] 280 354 75 0 [46765029] 509 583 75 0 [46765030] 509 583 75 0 [46765031] 509 583 75 0 [46765032] 179 253 75 0 --- seqlengths: hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 2580 ... 999 When I want to compute the coverage, I get the following: > cov_x = coverage(x) Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object '.Ranges.coverage' not found Do you know what I am doing wrong? Thanks, Stefanie On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > Hi Stephanie, > > Just to let you know that in GenomicRanges 1.14.2, coverage() should > now be much faster on objects with many seqlevels. Should be about > 1000x faster or more if you have 50 000 seqlevels. > > The only case where it's still slow is if many seqlevels are marked as > circular (and have a non-NA seqlength) in seqinfo(x), because for > these seqlevels the coverage vector is still post processed by a > loop in R. Could be a problem if for example people have BAM files > with reads aligned to thousands of bacterial genomes. > > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > biocLite() in the next 24 hours or so. The man pages for the "coverage" > methods have been improved, especially in IRanges. > > Cheers, > H. > > > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> Thanks for the answer! >> Just wanted to make sure that I don't miss a relevant feature?. >> >> Best, >> Stefanie >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org="">> <mailto:hpages at="" fhcrc.org="">>: >> >>> Hi Stephanie, >>> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >>>> What is the best way to compute the coverage on a ReadGAlignments >>>> object >>>> when I have many, many (about 50 000) seqlevels? >>>> >>>> Just computing coverage(object) is terribly slow due to the huge >>>> amount of seqlevels. >>>> >>>> At the moment, I am subsetting the ReadGAlignments object for each >>>> seqlevel, >>>> I omit the redundant seqlevels and calculate the coverage. >>>> This is fast if you do it for a few seqlevels. >>>> But if one really would like to compute the coverage for each >>>> seqlevel individually, >>>> this really takes a long time.. >>> >>> Yes this is a known weakness in the current implementation of >>> coverage(). It loops over the seqlevels and this loop is currently >>> performed in R, which is slow. Unfortunately this is not the first >>> time someone complains about this. >>> >>> Time for us to bite the bullet I guess. I've just started to work >>> on moving the loop to the C-level. That should speed up things >>> significantly. I'll let you know when this is ready. >>> >>> Thanks for the feedback. >>> H. >>> >>>> >>>> >>>> I would appreciate any comments! >>>> >>>> Best, >>>> Stefanie >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319

ADD REPLY • link 11.5 years ago Stefanie ▴ 360

0

Entering edit mode

Probably need to update GenomicRanges. On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < stefanie.tauber@univie.ac.at> wrote: > Hi Hervé, > > I just wanted to try the new coverage function: > > # I read in a bam file, human data mapped against the transcriptome > x = readGAlignments(human_trans_bam) > > # This is my GAligments object > > x > GAlignments with 46765032 alignments and 0 metadata columns: > seqnames strand cigar qwidth > <rle> <rle> <character> <integer> > [1] hg19_ensGene_ENST00000237247 + 75M 75 > [2] hg19_ensGene_ENST00000371036 - 75M 75 > [3] hg19_ensGene_ENST00000371037 - 75M 75 > [4] hg19_ensGene_ENST00000471889 - 75M 75 > [5] hg19_ensGene_ENST00000377479 + 75M 75 > ... ... ... ... ... > [46765028] hg19_ensGene_ENST00000344867 - 75M 75 > [46765029] hg19_ensGene_ENST00000400848 - 75M 75 > [46765030] hg19_ensGene_ENST00000400848 - 75M 75 > [46765031] hg19_ensGene_ENST00000400848 - 75M 75 > [46765032] hg19_ensGene_ENST00000400829 - 75M 75 > start end width ngap > <integer> <integer> <integer> <integer> > [1] 1783 1857 75 0 > [2] 690 764 75 0 > [3] 1632 1706 75 0 > [4] 2121 2195 75 0 > [5] 391 465 75 0 > ... ... ... ... ... > [46765028] 280 354 75 0 > [46765029] 509 583 75 0 > [46765030] 509 583 75 0 > [46765031] 509 583 75 0 > [46765032] 179 253 75 0 > --- > seqlengths: > hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 > 2580 ... 999 > > When I want to compute the coverage, I get the following: > > cov_x = coverage(x) > > Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > object '.Ranges.coverage' not found > > Do you know what I am doing wrong? > > Thanks, > Stefanie > > > > On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > > Hi Stephanie, > > > > Just to let you know that in GenomicRanges 1.14.2, coverage() should > > now be much faster on objects with many seqlevels. Should be about > > 1000x faster or more if you have 50 000 seqlevels. > > > > The only case where it's still slow is if many seqlevels are marked as > > circular (and have a non-NA seqlength) in seqinfo(x), because for > > these seqlevels the coverage vector is still post processed by a > > loop in R. Could be a problem if for example people have BAM files > > with reads aligned to thousands of bacterial genomes. > > > > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > > biocLite() in the next 24 hours or so. The man pages for the "coverage" > > methods have been improved, especially in IRanges. > > > > Cheers, > > H. > > > > > > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > >> Thanks for the answer! > >> Just wanted to make sure that I don't miss a relevant feature . > >> > >> Best, > >> Stefanie > >> > >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org> >> <mailto:hpages@fhcrc.org>>: > >> > >>> Hi Stephanie, > >>> > >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > >>>> What is the best way to compute the coverage on a ReadGAlignments > >>>> object > >>>> when I have many, many (about 50 000) seqlevels? > >>>> > >>>> Just computing coverage(object) is terribly slow due to the huge > >>>> amount of seqlevels. > >>>> > >>>> At the moment, I am subsetting the ReadGAlignments object for each > >>>> seqlevel, > >>>> I omit the redundant seqlevels and calculate the coverage. > >>>> This is fast if you do it for a few seqlevels. > >>>> But if one really would like to compute the coverage for each > >>>> seqlevel individually, > >>>> this really takes a long time.. > >>> > >>> Yes this is a known weakness in the current implementation of > >>> coverage(). It loops over the seqlevels and this loop is currently > >>> performed in R, which is slow. Unfortunately this is not the first > >>> time someone complains about this. > >>> > >>> Time for us to bite the bullet I guess. I've just started to work > >>> on moving the loop to the C-level. That should speed up things > >>> significantly. I'll let you know when this is ready. > >>> > >>> Thanks for the feedback. > >>> H. > >>> > >>>> > >>>> > >>>> I would appreciate any comments! > >>>> > >>>> Best, > >>>> Stefanie > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>> > >>> -- > >>> Hervé Pagès > >>> > >>> Program in Computational Biology > >>> Division of Public Health Sciences > >>> Fred Hutchinson Cancer Research Center > >>> 1100 Fairview Ave. N, M1-B514 > >>> P.O. Box 19024 > >>> Seattle, WA 98109-1024 > >>> > >>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> > >>> Phone: (206) 667-5791 > >>> Fax: (206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Here is my SessionInfo, GenomicRanges as well as IRanges should be up-to-date. R Under development (unstable) (2013-10-15 r64056) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 loaded via a namespace (and not attached): [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 stats4_3.1.0 [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 Best, Stefanie On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: > Probably need to update GenomicRanges. > > > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < > stefanie.tauber at univie.ac.at> wrote: > >> Hi Herv?, >> >> I just wanted to try the new coverage function: >> >> # I read in a bam file, human data mapped against the transcriptome >> x = readGAlignments(human_trans_bam) >> >> # This is my GAligments object >> > x >> GAlignments with 46765032 alignments and 0 metadata columns: >> seqnames strand cigar qwidth >> <rle> <rle> <character> <integer> >> [1] hg19_ensGene_ENST00000237247 + 75M 75 >> [2] hg19_ensGene_ENST00000371036 - 75M 75 >> [3] hg19_ensGene_ENST00000371037 - 75M 75 >> [4] hg19_ensGene_ENST00000471889 - 75M 75 >> [5] hg19_ensGene_ENST00000377479 + 75M 75 >> ... ... ... ... ... >> [46765028] hg19_ensGene_ENST00000344867 - 75M 75 >> [46765029] hg19_ensGene_ENST00000400848 - 75M 75 >> [46765030] hg19_ensGene_ENST00000400848 - 75M 75 >> [46765031] hg19_ensGene_ENST00000400848 - 75M 75 >> [46765032] hg19_ensGene_ENST00000400829 - 75M 75 >> start end width ngap >> <integer> <integer> <integer> <integer> >> [1] 1783 1857 75 0 >> [2] 690 764 75 0 >> [3] 1632 1706 75 0 >> [4] 2121 2195 75 0 >> [5] 391 465 75 0 >> ... ... ... ... ... >> [46765028] 280 354 75 0 >> [46765029] 509 583 75 0 >> [46765030] 509 583 75 0 >> [46765031] 509 583 75 0 >> [46765032] 179 253 75 0 >> --- >> seqlengths: >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 >> 2580 ... 999 >> >> When I want to compute the coverage, I get the following: >> > cov_x = coverage(x) >> >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> object '.Ranges.coverage' not found >> >> Do you know what I am doing wrong? >> >> Thanks, >> Stefanie >> >> >> >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: >> > Hi Stephanie, >> > >> > Just to let you know that in GenomicRanges 1.14.2, coverage() should >> > now be much faster on objects with many seqlevels. Should be about >> > 1000x faster or more if you have 50 000 seqlevels. >> > >> > The only case where it's still slow is if many seqlevels are marked as >> > circular (and have a non-NA seqlength) in seqinfo(x), because for >> > these seqlevels the coverage vector is still post processed by a >> > loop in R. Could be a problem if for example people have BAM files >> > with reads aligned to thousands of bacterial genomes. >> > >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru >> > biocLite() in the next 24 hours or so. The man pages for the >> "coverage" >> > methods have been improved, especially in IRanges. >> > >> > Cheers, >> > H. >> > >> > >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> >> Thanks for the answer! >> >> Just wanted to make sure that I don't miss a relevant feature?. >> >> >> >> Best, >> >> Stefanie >> >> >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org="">> >> <mailto:hpages at="" fhcrc.org="">>: >> >> >> >>> Hi Stephanie, >> >>> >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >> >>>> What is the best way to compute the coverage on a ReadGAlignments >> >>>> object >> >>>> when I have many, many (about 50 000) seqlevels? >> >>>> >> >>>> Just computing coverage(object) is terribly slow due to the huge >> >>>> amount of seqlevels. >> >>>> >> >>>> At the moment, I am subsetting the ReadGAlignments object for each >> >>>> seqlevel, >> >>>> I omit the redundant seqlevels and calculate the coverage. >> >>>> This is fast if you do it for a few seqlevels. >> >>>> But if one really would like to compute the coverage for each >> >>>> seqlevel individually, >> >>>> this really takes a long time.. >> >>> >> >>> Yes this is a known weakness in the current implementation of >> >>> coverage(). It loops over the seqlevels and this loop is currently >> >>> performed in R, which is slow. Unfortunately this is not the first >> >>> time someone complains about this. >> >>> >> >>> Time for us to bite the bullet I guess. I've just started to work >> >>> on moving the loop to the C-level. That should speed up things >> >>> significantly. I'll let you know when this is ready. >> >>> >> >>> Thanks for the feedback. >> >>> H. >> >>> >> >>>> >> >>>> >> >>>> I would appreciate any comments! >> >>>> >> >>>> Best, >> >>>> Stefanie >> >>>> >> >>>> _______________________________________________ >> >>>> Bioconductor mailing list >> >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>> Search the archives: >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>>> >> >>> >> >>> -- >> >>> Hervé Pagès >> >>> >> >>> Program in Computational Biology >> >>> Division of Public Health Sciences >> >>> Fred Hutchinson Cancer Research Center >> >>> 1100 Fairview Ave. N, M1-B514 >> >>> P.O. Box 19024 >> >>> Seattle, WA 98109-1024 >> >>> >> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >> >>> Phone: (206) 667-5791 >> >>> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- _____________________________________________________________ Stefanie Tauber, PhD Center for Integrative Bioinformatics Vienna (CIBIV) Max F. Perutz Laboratories (MFPL) Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 Dr. Bohr Gasse 9 A-1030 Wien, Austria Phone: ++43 +1 / 42772-4030 Fax: ++43 +1 / 42772-4098 email: stefanie.tauber at univie.ac.at

ADD REPLY • link 11.5 years ago Stefanie ▴ 360

0

Entering edit mode

Looks like there's a problem preventing newer GenomicRanges from being published. Doesn't look like it's my fault this time ;) Only option is to grab from svn right now. Michael On Mon, Oct 28, 2013 at 7:00 AM, Stefanie Tauber < stefanie.tauber@univie.ac.at> wrote: > Here is my SessionInfo, > GenomicRanges as well as IRanges should be up-to-date. > > R Under development (unstable) (2013-10-15 r64056) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 > [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 > [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 > [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 stats4_3.1.0 > [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 > > > Best, > Stefanie > > On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: > > Probably need to update GenomicRanges. > > > > > > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < > > stefanie.tauber@univie.ac.at> wrote: > > > >> Hi Hervé, > >> > >> I just wanted to try the new coverage function: > >> > >> # I read in a bam file, human data mapped against the transcriptome > >> x = readGAlignments(human_trans_bam) > >> > >> # This is my GAligments object > >> > x > >> GAlignments with 46765032 alignments and 0 metadata columns: > >> seqnames strand cigar qwidth > >> <rle> <rle> <character> <integer> > >> [1] hg19_ensGene_ENST00000237247 + 75M 75 > >> [2] hg19_ensGene_ENST00000371036 - 75M 75 > >> [3] hg19_ensGene_ENST00000371037 - 75M 75 > >> [4] hg19_ensGene_ENST00000471889 - 75M 75 > >> [5] hg19_ensGene_ENST00000377479 + 75M 75 > >> ... ... ... ... ... > >> [46765028] hg19_ensGene_ENST00000344867 - 75M 75 > >> [46765029] hg19_ensGene_ENST00000400848 - 75M 75 > >> [46765030] hg19_ensGene_ENST00000400848 - 75M 75 > >> [46765031] hg19_ensGene_ENST00000400848 - 75M 75 > >> [46765032] hg19_ensGene_ENST00000400829 - 75M 75 > >> start end width ngap > >> <integer> <integer> <integer> <integer> > >> [1] 1783 1857 75 0 > >> [2] 690 764 75 0 > >> [3] 1632 1706 75 0 > >> [4] 2121 2195 75 0 > >> [5] 391 465 75 0 > >> ... ... ... ... ... > >> [46765028] 280 354 75 0 > >> [46765029] 509 583 75 0 > >> [46765030] 509 583 75 0 > >> [46765031] 509 583 75 0 > >> [46765032] 179 253 75 0 > >> --- > >> seqlengths: > >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 > >> 2580 ... 999 > >> > >> When I want to compute the coverage, I get the following: > >> > cov_x = coverage(x) > >> > >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > >> object '.Ranges.coverage' not found > >> > >> Do you know what I am doing wrong? > >> > >> Thanks, > >> Stefanie > >> > >> > >> > >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > >> > Hi Stephanie, > >> > > >> > Just to let you know that in GenomicRanges 1.14.2, coverage() should > >> > now be much faster on objects with many seqlevels. Should be about > >> > 1000x faster or more if you have 50 000 seqlevels. > >> > > >> > The only case where it's still slow is if many seqlevels are marked as > >> > circular (and have a non-NA seqlength) in seqinfo(x), because for > >> > these seqlevels the coverage vector is still post processed by a > >> > loop in R. Could be a problem if for example people have BAM files > >> > with reads aligned to thousands of bacterial genomes. > >> > > >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > >> > biocLite() in the next 24 hours or so. The man pages for the > >> "coverage" > >> > methods have been improved, especially in IRanges. > >> > > >> > Cheers, > >> > H. > >> > > >> > > >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > >> >> Thanks for the answer! > >> >> Just wanted to make sure that I don't miss a relevant feature . > >> >> > >> >> Best, > >> >> Stefanie > >> >> > >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org> >> >> <mailto:hpages@fhcrc.org>>: > >> >> > >> >>> Hi Stephanie, > >> >>> > >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > >> >>>> What is the best way to compute the coverage on a ReadGAlignments > >> >>>> object > >> >>>> when I have many, many (about 50 000) seqlevels? > >> >>>> > >> >>>> Just computing coverage(object) is terribly slow due to the huge > >> >>>> amount of seqlevels. > >> >>>> > >> >>>> At the moment, I am subsetting the ReadGAlignments object for each > >> >>>> seqlevel, > >> >>>> I omit the redundant seqlevels and calculate the coverage. > >> >>>> This is fast if you do it for a few seqlevels. > >> >>>> But if one really would like to compute the coverage for each > >> >>>> seqlevel individually, > >> >>>> this really takes a long time.. > >> >>> > >> >>> Yes this is a known weakness in the current implementation of > >> >>> coverage(). It loops over the seqlevels and this loop is currently > >> >>> performed in R, which is slow. Unfortunately this is not the first > >> >>> time someone complains about this. > >> >>> > >> >>> Time for us to bite the bullet I guess. I've just started to work > >> >>> on moving the loop to the C-level. That should speed up things > >> >>> significantly. I'll let you know when this is ready. > >> >>> > >> >>> Thanks for the feedback. > >> >>> H. > >> >>> > >> >>>> > >> >>>> > >> >>>> I would appreciate any comments! > >> >>>> > >> >>>> Best, > >> >>>> Stefanie > >> >>>> > >> >>>> _______________________________________________ > >> >>>> Bioconductor mailing list > >> >>>> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>>> Search the archives: > >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>>> > >> >>> > >> >>> -- > >> >>> Hervé Pagès > >> >>> > >> >>> Program in Computational Biology > >> >>> Division of Public Health Sciences > >> >>> Fred Hutchinson Cancer Research Center > >> >>> 1100 Fairview Ave. N, M1-B514 > >> >>> P.O. Box 19024 > >> >>> Seattle, WA 98109-1024 > >> >>> > >> >>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> > >> >>> Phone: (206) 667-5791 > >> >>> Fax: (206) 667-1319 > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > -- > _____________________________________________________________ > > Stefanie Tauber, PhD > > Center for Integrative Bioinformatics Vienna (CIBIV) > Max F. Perutz Laboratories (MFPL) > Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 > Dr. Bohr Gasse 9 > A-1030 Wien, Austria > Phone: ++43 +1 / 42772-4030 > Fax: ++43 +1 / 42772-4098 > email: stefanie.tauber@univie.ac.at > > _____________________________________________________________ > > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi Stephanie, GenomicRanges 1.14.2 (in BioC release) is available since last Friday. In BioC devel though, other problems got in the way but GenomicRanges 1.15.5 is finally out there (since a couple of hours). Cheers, H. On 10/28/2013 09:24 AM, Michael Lawrence wrote: > Looks like there's a problem preventing newer GenomicRanges from being > published. Doesn't look like it's my fault this time ;) Only option is > to grab from svn right now. > > Michael > > > On Mon, Oct 28, 2013 at 7:00 AM, Stefanie Tauber > <stefanie.tauber at="" univie.ac.at="" <mailto:stefanie.tauber="" at="" univie.ac.at="">> wrote: > > Here is my SessionInfo, > GenomicRanges as well as IRanges should be up-to-date. > > R Under development (unstable) (2013-10-15 r64056) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 > [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 > [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 > [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 > stats4_3.1.0 > [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 > > > Best, > Stefanie > > On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: > > Probably need to update GenomicRanges. > > > > > > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < > > stefanie.tauber at univie.ac.at > <mailto:stefanie.tauber at="" univie.ac.at="">> wrote: > > > >> Hi Herv?, > >> > >> I just wanted to try the new coverage function: > >> > >> # I read in a bam file, human data mapped against the transcriptome > >> x = readGAlignments(human_trans_bam) > >> > >> # This is my GAligments object > >> > x > >> GAlignments with 46765032 alignments and 0 metadata columns: > >> seqnames strand cigar > qwidth > >> <rle> <rle> <character> > <integer> > >> [1] hg19_ensGene_ENST00000237247 + 75M > 75 > >> [2] hg19_ensGene_ENST00000371036 - 75M > 75 > >> [3] hg19_ensGene_ENST00000371037 - 75M > 75 > >> [4] hg19_ensGene_ENST00000471889 - 75M > 75 > >> [5] hg19_ensGene_ENST00000377479 + 75M > 75 > >> ... ... ... ... > ... > >> [46765028] hg19_ensGene_ENST00000344867 - 75M > 75 > >> [46765029] hg19_ensGene_ENST00000400848 - 75M > 75 > >> [46765030] hg19_ensGene_ENST00000400848 - 75M > 75 > >> [46765031] hg19_ensGene_ENST00000400848 - 75M > 75 > >> [46765032] hg19_ensGene_ENST00000400829 - 75M > 75 > >> start end width ngap > >> <integer> <integer> <integer> <integer> > >> [1] 1783 1857 75 0 > >> [2] 690 764 75 0 > >> [3] 1632 1706 75 0 > >> [4] 2121 2195 75 0 > >> [5] 391 465 75 0 > >> ... ... ... ... ... > >> [46765028] 280 354 75 0 > >> [46765029] 509 583 75 0 > >> [46765030] 509 583 75 0 > >> [46765031] 509 583 75 0 > >> [46765032] 179 253 75 0 > >> --- > >> seqlengths: > >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 > >> 2580 ... 999 > >> > >> When I want to compute the coverage, I get the following: > >> > cov_x = coverage(x) > >> > >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > >> object '.Ranges.coverage' not found > >> > >> Do you know what I am doing wrong? > >> > >> Thanks, > >> Stefanie > >> > >> > >> > >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > >> > Hi Stephanie, > >> > > >> > Just to let you know that in GenomicRanges 1.14.2, coverage() > should > >> > now be much faster on objects with many seqlevels. Should be about > >> > 1000x faster or more if you have 50 000 seqlevels. > >> > > >> > The only case where it's still slow is if many seqlevels are > marked as > >> > circular (and have a non-NA seqlength) in seqinfo(x), because for > >> > these seqlevels the coverage vector is still post processed by a > >> > loop in R. Could be a problem if for example people have BAM files > >> > with reads aligned to thousands of bacterial genomes. > >> > > >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become > available thru > >> > biocLite() in the next 24 hours or so. The man pages for the > >> "coverage" > >> > methods have been improved, especially in IRanges. > >> > > >> > Cheers, > >> > H. > >> > > >> > > >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > >> >> Thanks for the answer! > >> >> Just wanted to make sure that I don't miss a relevant feature?. > >> >> > >> >> Best, > >> >> Stefanie > >> >> > >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> > >> >> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>: > >> >> > >> >>> Hi Stephanie, > >> >>> > >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > >> >>>> What is the best way to compute the coverage on a > ReadGAlignments > >> >>>> object > >> >>>> when I have many, many (about 50 000) seqlevels? > >> >>>> > >> >>>> Just computing coverage(object) is terribly slow due to the > huge > >> >>>> amount of seqlevels. > >> >>>> > >> >>>> At the moment, I am subsetting the ReadGAlignments object > for each > >> >>>> seqlevel, > >> >>>> I omit the redundant seqlevels and calculate the coverage. > >> >>>> This is fast if you do it for a few seqlevels. > >> >>>> But if one really would like to compute the coverage for each > >> >>>> seqlevel individually, > >> >>>> this really takes a long time.. > >> >>> > >> >>> Yes this is a known weakness in the current implementation of > >> >>> coverage(). It loops over the seqlevels and this loop is > currently > >> >>> performed in R, which is slow. Unfortunately this is not the > first > >> >>> time someone complains about this. > >> >>> > >> >>> Time for us to bite the bullet I guess. I've just started to > work > >> >>> on moving the loop to the C-level. That should speed up things > >> >>> significantly. I'll let you know when this is ready. > >> >>> > >> >>> Thanks for the feedback. > >> >>> H. > >> >>> > >> >>>> > >> >>>> > >> >>>> I would appreciate any comments! > >> >>>> > >> >>>> Best, > >> >>>> Stefanie > >> >>>> > >> >>>> _______________________________________________ > >> >>>> Bioconductor mailing list > >> >>>> Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>>> Search the archives: > >> >>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>>> > >> >>> > >> >>> -- > >> >>> Hervé Pagès > >> >>> > >> >>> Program in Computational Biology > >> >>> Division of Public Health Sciences > >> >>> Fred Hutchinson Cancer Research Center > >> >>> 1100 Fairview Ave. N, M1-B514 > >> >>> P.O. Box 19024 > >> >>> Seattle, WA 98109-1024 > >> >>> > >> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> > >> >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > >> >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > -- > _____________________________________________________________ > > Stefanie Tauber, PhD > > Center for Integrative Bioinformatics Vienna (CIBIV) > Max F. Perutz Laboratories (MFPL) > Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 > Dr. Bohr Gasse 9 > A-1030 Wien, Austria > Phone: ++43 +1 / 42772-4030 > Fax: ++43 +1 / 42772-4098 > email: stefanie.tauber at univie.ac.at > <mailto:stefanie.tauber at="" univie.ac.at=""> > > _____________________________________________________________ > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 11.5 years ago Hervé Pagès 16k

0

Entering edit mode

Its working, thanks! Best, Stefanie Am 28.10.2013 um 19:12 schrieb Hervé Pagès <hpages@fhcrc.org>: > Hi Stephanie, > > GenomicRanges 1.14.2 (in BioC release) is available since last Friday. > > In BioC devel though, other problems got in the way but GenomicRanges > 1.15.5 is finally out there (since a couple of hours). > > Cheers, > H. > > > On 10/28/2013 09:24 AM, Michael Lawrence wrote: >> Looks like there's a problem preventing newer GenomicRanges from being >> published. Doesn't look like it's my fault this time ;) Only option is >> to grab from svn right now. >> >> Michael >> >> >> On Mon, Oct 28, 2013 at 7:00 AM, Stefanie Tauber >> <stefanie.tauber@univie.ac.at <mailto:stefanie.tauber@univie.ac.at="">> wrote: >> >> Here is my SessionInfo, >> GenomicRanges as well as IRanges should be up-to-date. >> >> R Under development (unstable) (2013-10-15 r64056) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 >> [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 >> [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 >> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 >> stats4_3.1.0 >> [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 >> >> >> Best, >> Stefanie >> >> On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: >> > Probably need to update GenomicRanges. >> > >> > >> > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < >> > stefanie.tauber@univie.ac.at >> <mailto:stefanie.tauber@univie.ac.at>> wrote: >> > >> >> Hi Hervé, >> >> >> >> I just wanted to try the new coverage function: >> >> >> >> # I read in a bam file, human data mapped against the transcriptome >> >> x = readGAlignments(human_trans_bam) >> >> >> >> # This is my GAligments object >> >> > x >> >> GAlignments with 46765032 alignments and 0 metadata columns: >> >> seqnames strand cigar >> qwidth >> >> <rle> <rle> <character> >> <integer> >> >> [1] hg19_ensGene_ENST00000237247 + 75M >> 75 >> >> [2] hg19_ensGene_ENST00000371036 - 75M >> 75 >> >> [3] hg19_ensGene_ENST00000371037 - 75M >> 75 >> >> [4] hg19_ensGene_ENST00000471889 - 75M >> 75 >> >> [5] hg19_ensGene_ENST00000377479 + 75M >> 75 >> >> ... ... ... ... >> ... >> >> [46765028] hg19_ensGene_ENST00000344867 - 75M >> 75 >> >> [46765029] hg19_ensGene_ENST00000400848 - 75M >> 75 >> >> [46765030] hg19_ensGene_ENST00000400848 - 75M >> 75 >> >> [46765031] hg19_ensGene_ENST00000400848 - 75M >> 75 >> >> [46765032] hg19_ensGene_ENST00000400829 - 75M >> 75 >> >> start end width ngap >> >> <integer> <integer> <integer> <integer> >> >> [1] 1783 1857 75 0 >> >> [2] 690 764 75 0 >> >> [3] 1632 1706 75 0 >> >> [4] 2121 2195 75 0 >> >> [5] 391 465 75 0 >> >> ... ... ... ... ... >> >> [46765028] 280 354 75 0 >> >> [46765029] 509 583 75 0 >> >> [46765030] 509 583 75 0 >> >> [46765031] 509 583 75 0 >> >> [46765032] 179 253 75 0 >> >> --- >> >> seqlengths: >> >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 >> >> 2580 ... 999 >> >> >> >> When I want to compute the coverage, I get the following: >> >> > cov_x = coverage(x) >> >> >> >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> >> object '.Ranges.coverage' not found >> >> >> >> Do you know what I am doing wrong? >> >> >> >> Thanks, >> >> Stefanie >> >> >> >> >> >> >> >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: >> >> > Hi Stephanie, >> >> > >> >> > Just to let you know that in GenomicRanges 1.14.2, coverage() >> should >> >> > now be much faster on objects with many seqlevels. Should be about >> >> > 1000x faster or more if you have 50 000 seqlevels. >> >> > >> >> > The only case where it's still slow is if many seqlevels are >> marked as >> >> > circular (and have a non-NA seqlength) in seqinfo(x), because for >> >> > these seqlevels the coverage vector is still post processed by a >> >> > loop in R. Could be a problem if for example people have BAM files >> >> > with reads aligned to thousands of bacterial genomes. >> >> > >> >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become >> available thru >> >> > biocLite() in the next 24 hours or so. The man pages for the >> >> "coverage" >> >> > methods have been improved, especially in IRanges. >> >> > >> >> > Cheers, >> >> > H. >> >> > >> >> > >> >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> >> >> Thanks for the answer! >> >> >> Just wanted to make sure that I don't miss a relevant feature . >> >> >> >> >> >> Best, >> >> >> Stefanie >> >> >> >> >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> >> >> >> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>: >> >> >> >> >> >>> Hi Stephanie, >> >> >>> >> >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >> >> >>>> What is the best way to compute the coverage on a >> ReadGAlignments >> >> >>>> object >> >> >>>> when I have many, many (about 50 000) seqlevels? >> >> >>>> >> >> >>>> Just computing coverage(object) is terribly slow due to the >> huge >> >> >>>> amount of seqlevels. >> >> >>>> >> >> >>>> At the moment, I am subsetting the ReadGAlignments object >> for each >> >> >>>> seqlevel, >> >> >>>> I omit the redundant seqlevels and calculate the coverage. >> >> >>>> This is fast if you do it for a few seqlevels. >> >> >>>> But if one really would like to compute the coverage for each >> >> >>>> seqlevel individually, >> >> >>>> this really takes a long time.. >> >> >>> >> >> >>> Yes this is a known weakness in the current implementation of >> >> >>> coverage(). It loops over the seqlevels and this loop is >> currently >> >> >>> performed in R, which is slow. Unfortunately this is not the >> first >> >> >>> time someone complains about this. >> >> >>> >> >> >>> Time for us to bite the bullet I guess. I've just started to >> work >> >> >>> on moving the loop to the C-level. That should speed up things >> >> >>> significantly. I'll let you know when this is ready. >> >> >>> >> >> >>> Thanks for the feedback. >> >> >>> H. >> >> >>> >> >> >>>> >> >> >>>> >> >> >>>> I would appreciate any comments! >> >> >>>> >> >> >>>> Best, >> >> >>>> Stefanie >> >> >>>> >> >> >>>> _______________________________________________ >> >> >>>> Bioconductor mailing list >> >> >>>> Bioconductor@r-project.org >> <mailto:bioconductor@r-project.org> >> <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> >> >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >>>> Search the archives: >> >> >>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >>>> >> >> >>> >> >> >>> -- >> >> >>> Hervé Pagès >> >> >>> >> >> >>> Program in Computational Biology >> >> >>> Division of Public Health Sciences >> >> >>> Fred Hutchinson Cancer Research Center >> >> >>> 1100 Fairview Ave. N, M1-B514 >> >> >>> P.O. Box 19024 >> >> >>> Seattle, WA 98109-1024 >> >> >>> >> >> >>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> >> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">> >> >> >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> >> >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> >> -- > > [[alternative HTML version deleted]]

ADD REPLY • link 11.5 years ago Stefanie ▴ 360

Login before adding your answer.