Computing the coverage on a bamfile with a huge amount of seqlevels
1
0
Entering edit mode
Stefanie ▴ 360
@stefanie-5192
Last seen 10.2 years ago
What is the best way to compute the coverage on a ReadGAlignments object when I have many, many (about 50 000) seqlevels? Just computing coverage(object) is terribly slow due to the huge amount of seqlevels. At the moment, I am subsetting the ReadGAlignments object for each seqlevel, I omit the redundant seqlevels and calculate the coverage. This is fast if you do it for a few seqlevels. But if one really would like to compute the coverage for each seqlevel individually, this really takes a long time.. I would appreciate any comments! Best, Stefanie
Coverage Coverage • 1.6k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 21 hours ago
Seattle, WA, United States
Hi Stephanie, On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > What is the best way to compute the coverage on a ReadGAlignments object > when I have many, many (about 50 000) seqlevels? > > Just computing coverage(object) is terribly slow due to the huge amount of seqlevels. > > At the moment, I am subsetting the ReadGAlignments object for each seqlevel, > I omit the redundant seqlevels and calculate the coverage. > This is fast if you do it for a few seqlevels. > But if one really would like to compute the coverage for each seqlevel individually, > this really takes a long time.. Yes this is a known weakness in the current implementation of coverage(). It loops over the seqlevels and this loop is currently performed in R, which is slow. Unfortunately this is not the first time someone complains about this. Time for us to bite the bullet I guess. I've just started to work on moving the loop to the C-level. That should speed up things significantly. I'll let you know when this is ready. Thanks for the feedback. H. > > > I would appreciate any comments! > > Best, > Stefanie > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Thanks for the answer! Just wanted to make sure that I don't miss a relevant feature . Best, Stefanie Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org>: > Hi Stephanie, > > On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >> What is the best way to compute the coverage on a ReadGAlignments object >> when I have many, many (about 50 000) seqlevels? >> >> Just computing coverage(object) is terribly slow due to the huge amount of seqlevels. >> >> At the moment, I am subsetting the ReadGAlignments object for each seqlevel, >> I omit the redundant seqlevels and calculate the coverage. >> This is fast if you do it for a few seqlevels. >> But if one really would like to compute the coverage for each seqlevel individually, >> this really takes a long time.. > > Yes this is a known weakness in the current implementation of > coverage(). It loops over the seqlevels and this loop is currently > performed in R, which is slow. Unfortunately this is not the first > time someone complains about this. > > Time for us to bite the bullet I guess. I've just started to work > on moving the loop to the C-level. That should speed up things > significantly. I'll let you know when this is ready. > > Thanks for the feedback. > H. > >> >> >> I would appreciate any comments! >> >> Best, >> Stefanie >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Stephanie, Just to let you know that in GenomicRanges 1.14.2, coverage() should now be much faster on objects with many seqlevels. Should be about 1000x faster or more if you have 50 000 seqlevels. The only case where it's still slow is if many seqlevels are marked as circular (and have a non-NA seqlength) in seqinfo(x), because for these seqlevels the coverage vector is still post processed by a loop in R. Could be a problem if for example people have BAM files with reads aligned to thousands of bacterial genomes. GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru biocLite() in the next 24 hours or so. The man pages for the "coverage" methods have been improved, especially in IRanges. Cheers, H. On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > Thanks for the answer! > Just wanted to make sure that I don't miss a relevant feature?. > > Best, > Stefanie > > Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org="">>: > >> Hi Stephanie, >> >> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >>> What is the best way to compute the coverage on a ReadGAlignments object >>> when I have many, many (about 50 000) seqlevels? >>> >>> Just computing coverage(object) is terribly slow due to the huge >>> amount of seqlevels. >>> >>> At the moment, I am subsetting the ReadGAlignments object for each >>> seqlevel, >>> I omit the redundant seqlevels and calculate the coverage. >>> This is fast if you do it for a few seqlevels. >>> But if one really would like to compute the coverage for each >>> seqlevel individually, >>> this really takes a long time.. >> >> Yes this is a known weakness in the current implementation of >> coverage(). It loops over the seqlevels and this loop is currently >> performed in R, which is slow. Unfortunately this is not the first >> time someone complains about this. >> >> Time for us to bite the bullet I guess. I've just started to work >> on moving the loop to the C-level. That should speed up things >> significantly. I'll let you know when this is ready. >> >> Thanks for the feedback. >> H. >> >>> >>> >>> I would appreciate any comments! >>> >>> Best, >>> Stefanie >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Thanks a lot! Can?t wait to try the new version! Best, Stefanie Am 25.10.2013 um 12:44 schrieb Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Stephanie, > > Just to let you know that in GenomicRanges 1.14.2, coverage() should > now be much faster on objects with many seqlevels. Should be about > 1000x faster or more if you have 50 000 seqlevels. > > The only case where it's still slow is if many seqlevels are marked as > circular (and have a non-NA seqlength) in seqinfo(x), because for > these seqlevels the coverage vector is still post processed by a > loop in R. Could be a problem if for example people have BAM files > with reads aligned to thousands of bacterial genomes. > > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > biocLite() in the next 24 hours or so. The man pages for the "coverage" > methods have been improved, especially in IRanges. > > Cheers, > H. > > > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> Thanks for the answer! >> Just wanted to make sure that I don't miss a relevant feature?. >> >> Best, >> Stefanie >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org="">> <mailto:hpages at="" fhcrc.org="">>: >> >>> Hi Stephanie, >>> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >>>> What is the best way to compute the coverage on a ReadGAlignments object >>>> when I have many, many (about 50 000) seqlevels? >>>> >>>> Just computing coverage(object) is terribly slow due to the huge >>>> amount of seqlevels. >>>> >>>> At the moment, I am subsetting the ReadGAlignments object for each >>>> seqlevel, >>>> I omit the redundant seqlevels and calculate the coverage. >>>> This is fast if you do it for a few seqlevels. >>>> But if one really would like to compute the coverage for each >>>> seqlevel individually, >>>> this really takes a long time.. >>> >>> Yes this is a known weakness in the current implementation of >>> coverage(). It loops over the seqlevels and this loop is currently >>> performed in R, which is slow. Unfortunately this is not the first >>> time someone complains about this. >>> >>> Time for us to bite the bullet I guess. I've just started to work >>> on moving the loop to the C-level. That should speed up things >>> significantly. I'll let you know when this is ready. >>> >>> Thanks for the feedback. >>> H. >>> >>>> >>>> >>>> I would appreciate any comments! >>>> >>>> Best, >>>> Stefanie >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >> >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi Herv?, I just wanted to try the new coverage function: # I read in a bam file, human data mapped against the transcriptome x = readGAlignments(human_trans_bam) # This is my GAligments object > x GAlignments with 46765032 alignments and 0 metadata columns: seqnames strand cigar qwidth <rle> <rle> <character> <integer> [1] hg19_ensGene_ENST00000237247 + 75M 75 [2] hg19_ensGene_ENST00000371036 - 75M 75 [3] hg19_ensGene_ENST00000371037 - 75M 75 [4] hg19_ensGene_ENST00000471889 - 75M 75 [5] hg19_ensGene_ENST00000377479 + 75M 75 ... ... ... ... ... [46765028] hg19_ensGene_ENST00000344867 - 75M 75 [46765029] hg19_ensGene_ENST00000400848 - 75M 75 [46765030] hg19_ensGene_ENST00000400848 - 75M 75 [46765031] hg19_ensGene_ENST00000400848 - 75M 75 [46765032] hg19_ensGene_ENST00000400829 - 75M 75 start end width ngap <integer> <integer> <integer> <integer> [1] 1783 1857 75 0 [2] 690 764 75 0 [3] 1632 1706 75 0 [4] 2121 2195 75 0 [5] 391 465 75 0 ... ... ... ... ... [46765028] 280 354 75 0 [46765029] 509 583 75 0 [46765030] 509 583 75 0 [46765031] 509 583 75 0 [46765032] 179 253 75 0 --- seqlengths: hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 2580 ... 999 When I want to compute the coverage, I get the following: > cov_x = coverage(x) Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : object '.Ranges.coverage' not found Do you know what I am doing wrong? Thanks, Stefanie On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > Hi Stephanie, > > Just to let you know that in GenomicRanges 1.14.2, coverage() should > now be much faster on objects with many seqlevels. Should be about > 1000x faster or more if you have 50 000 seqlevels. > > The only case where it's still slow is if many seqlevels are marked as > circular (and have a non-NA seqlength) in seqinfo(x), because for > these seqlevels the coverage vector is still post processed by a > loop in R. Could be a problem if for example people have BAM files > with reads aligned to thousands of bacterial genomes. > > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > biocLite() in the next 24 hours or so. The man pages for the "coverage" > methods have been improved, especially in IRanges. > > Cheers, > H. > > > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> Thanks for the answer! >> Just wanted to make sure that I don't miss a relevant feature?. >> >> Best, >> Stefanie >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org="">> <mailto:hpages at="" fhcrc.org="">>: >> >>> Hi Stephanie, >>> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >>>> What is the best way to compute the coverage on a ReadGAlignments >>>> object >>>> when I have many, many (about 50 000) seqlevels? >>>> >>>> Just computing coverage(object) is terribly slow due to the huge >>>> amount of seqlevels. >>>> >>>> At the moment, I am subsetting the ReadGAlignments object for each >>>> seqlevel, >>>> I omit the redundant seqlevels and calculate the coverage. >>>> This is fast if you do it for a few seqlevels. >>>> But if one really would like to compute the coverage for each >>>> seqlevel individually, >>>> this really takes a long time.. >>> >>> Yes this is a known weakness in the current implementation of >>> coverage(). It loops over the seqlevels and this loop is currently >>> performed in R, which is slow. Unfortunately this is not the first >>> time someone complains about this. >>> >>> Time for us to bite the bullet I guess. I've just started to work >>> on moving the loop to the C-level. That should speed up things >>> significantly. I'll let you know when this is ready. >>> >>> Thanks for the feedback. >>> H. >>> >>>> >>>> >>>> I would appreciate any comments! >>>> >>>> Best, >>>> Stefanie >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Probably need to update GenomicRanges. On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < stefanie.tauber@univie.ac.at> wrote: > Hi Hervé, > > I just wanted to try the new coverage function: > > # I read in a bam file, human data mapped against the transcriptome > x = readGAlignments(human_trans_bam) > > # This is my GAligments object > > x > GAlignments with 46765032 alignments and 0 metadata columns: > seqnames strand cigar qwidth > <rle> <rle> <character> <integer> > [1] hg19_ensGene_ENST00000237247 + 75M 75 > [2] hg19_ensGene_ENST00000371036 - 75M 75 > [3] hg19_ensGene_ENST00000371037 - 75M 75 > [4] hg19_ensGene_ENST00000471889 - 75M 75 > [5] hg19_ensGene_ENST00000377479 + 75M 75 > ... ... ... ... ... > [46765028] hg19_ensGene_ENST00000344867 - 75M 75 > [46765029] hg19_ensGene_ENST00000400848 - 75M 75 > [46765030] hg19_ensGene_ENST00000400848 - 75M 75 > [46765031] hg19_ensGene_ENST00000400848 - 75M 75 > [46765032] hg19_ensGene_ENST00000400829 - 75M 75 > start end width ngap > <integer> <integer> <integer> <integer> > [1] 1783 1857 75 0 > [2] 690 764 75 0 > [3] 1632 1706 75 0 > [4] 2121 2195 75 0 > [5] 391 465 75 0 > ... ... ... ... ... > [46765028] 280 354 75 0 > [46765029] 509 583 75 0 > [46765030] 509 583 75 0 > [46765031] 509 583 75 0 > [46765032] 179 253 75 0 > --- > seqlengths: > hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 > 2580 ... 999 > > When I want to compute the coverage, I get the following: > > cov_x = coverage(x) > > Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > object '.Ranges.coverage' not found > > Do you know what I am doing wrong? > > Thanks, > Stefanie > > > > On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > > Hi Stephanie, > > > > Just to let you know that in GenomicRanges 1.14.2, coverage() should > > now be much faster on objects with many seqlevels. Should be about > > 1000x faster or more if you have 50 000 seqlevels. > > > > The only case where it's still slow is if many seqlevels are marked as > > circular (and have a non-NA seqlength) in seqinfo(x), because for > > these seqlevels the coverage vector is still post processed by a > > loop in R. Could be a problem if for example people have BAM files > > with reads aligned to thousands of bacterial genomes. > > > > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > > biocLite() in the next 24 hours or so. The man pages for the "coverage" > > methods have been improved, especially in IRanges. > > > > Cheers, > > H. > > > > > > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > >> Thanks for the answer! > >> Just wanted to make sure that I don't miss a relevant feature . > >> > >> Best, > >> Stefanie > >> > >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org> >> <mailto:hpages@fhcrc.org>>: > >> > >>> Hi Stephanie, > >>> > >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > >>>> What is the best way to compute the coverage on a ReadGAlignments > >>>> object > >>>> when I have many, many (about 50 000) seqlevels? > >>>> > >>>> Just computing coverage(object) is terribly slow due to the huge > >>>> amount of seqlevels. > >>>> > >>>> At the moment, I am subsetting the ReadGAlignments object for each > >>>> seqlevel, > >>>> I omit the redundant seqlevels and calculate the coverage. > >>>> This is fast if you do it for a few seqlevels. > >>>> But if one really would like to compute the coverage for each > >>>> seqlevel individually, > >>>> this really takes a long time.. > >>> > >>> Yes this is a known weakness in the current implementation of > >>> coverage(). It loops over the seqlevels and this loop is currently > >>> performed in R, which is slow. Unfortunately this is not the first > >>> time someone complains about this. > >>> > >>> Time for us to bite the bullet I guess. I've just started to work > >>> on moving the loop to the C-level. That should speed up things > >>> significantly. I'll let you know when this is ready. > >>> > >>> Thanks for the feedback. > >>> H. > >>> > >>>> > >>>> > >>>> I would appreciate any comments! > >>>> > >>>> Best, > >>>> Stefanie > >>>> > >>>> _______________________________________________ > >>>> Bioconductor mailing list > >>>> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>> Search the archives: > >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>> > >>> > >>> -- > >>> Hervé Pagès > >>> > >>> Program in Computational Biology > >>> Division of Public Health Sciences > >>> Fred Hutchinson Cancer Research Center > >>> 1100 Fairview Ave. N, M1-B514 > >>> P.O. Box 19024 > >>> Seattle, WA 98109-1024 > >>> > >>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> > >>> Phone: (206) 667-5791 > >>> Fax: (206) 667-1319 > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Here is my SessionInfo, GenomicRanges as well as IRanges should be up-to-date. R Under development (unstable) (2013-10-15 r64056) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 loaded via a namespace (and not attached): [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 stats4_3.1.0 [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 Best, Stefanie On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: > Probably need to update GenomicRanges. > > > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < > stefanie.tauber at univie.ac.at> wrote: > >> Hi Herv?, >> >> I just wanted to try the new coverage function: >> >> # I read in a bam file, human data mapped against the transcriptome >> x = readGAlignments(human_trans_bam) >> >> # This is my GAligments object >> > x >> GAlignments with 46765032 alignments and 0 metadata columns: >> seqnames strand cigar qwidth >> <rle> <rle> <character> <integer> >> [1] hg19_ensGene_ENST00000237247 + 75M 75 >> [2] hg19_ensGene_ENST00000371036 - 75M 75 >> [3] hg19_ensGene_ENST00000371037 - 75M 75 >> [4] hg19_ensGene_ENST00000471889 - 75M 75 >> [5] hg19_ensGene_ENST00000377479 + 75M 75 >> ... ... ... ... ... >> [46765028] hg19_ensGene_ENST00000344867 - 75M 75 >> [46765029] hg19_ensGene_ENST00000400848 - 75M 75 >> [46765030] hg19_ensGene_ENST00000400848 - 75M 75 >> [46765031] hg19_ensGene_ENST00000400848 - 75M 75 >> [46765032] hg19_ensGene_ENST00000400829 - 75M 75 >> start end width ngap >> <integer> <integer> <integer> <integer> >> [1] 1783 1857 75 0 >> [2] 690 764 75 0 >> [3] 1632 1706 75 0 >> [4] 2121 2195 75 0 >> [5] 391 465 75 0 >> ... ... ... ... ... >> [46765028] 280 354 75 0 >> [46765029] 509 583 75 0 >> [46765030] 509 583 75 0 >> [46765031] 509 583 75 0 >> [46765032] 179 253 75 0 >> --- >> seqlengths: >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 >> 2580 ... 999 >> >> When I want to compute the coverage, I get the following: >> > cov_x = coverage(x) >> >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> object '.Ranges.coverage' not found >> >> Do you know what I am doing wrong? >> >> Thanks, >> Stefanie >> >> >> >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: >> > Hi Stephanie, >> > >> > Just to let you know that in GenomicRanges 1.14.2, coverage() should >> > now be much faster on objects with many seqlevels. Should be about >> > 1000x faster or more if you have 50 000 seqlevels. >> > >> > The only case where it's still slow is if many seqlevels are marked as >> > circular (and have a non-NA seqlength) in seqinfo(x), because for >> > these seqlevels the coverage vector is still post processed by a >> > loop in R. Could be a problem if for example people have BAM files >> > with reads aligned to thousands of bacterial genomes. >> > >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru >> > biocLite() in the next 24 hours or so. The man pages for the >> "coverage" >> > methods have been improved, especially in IRanges. >> > >> > Cheers, >> > H. >> > >> > >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> >> Thanks for the answer! >> >> Just wanted to make sure that I don't miss a relevant feature?. >> >> >> >> Best, >> >> Stefanie >> >> >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org="">> >> <mailto:hpages at="" fhcrc.org="">>: >> >> >> >>> Hi Stephanie, >> >>> >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >> >>>> What is the best way to compute the coverage on a ReadGAlignments >> >>>> object >> >>>> when I have many, many (about 50 000) seqlevels? >> >>>> >> >>>> Just computing coverage(object) is terribly slow due to the huge >> >>>> amount of seqlevels. >> >>>> >> >>>> At the moment, I am subsetting the ReadGAlignments object for each >> >>>> seqlevel, >> >>>> I omit the redundant seqlevels and calculate the coverage. >> >>>> This is fast if you do it for a few seqlevels. >> >>>> But if one really would like to compute the coverage for each >> >>>> seqlevel individually, >> >>>> this really takes a long time.. >> >>> >> >>> Yes this is a known weakness in the current implementation of >> >>> coverage(). It loops over the seqlevels and this loop is currently >> >>> performed in R, which is slow. Unfortunately this is not the first >> >>> time someone complains about this. >> >>> >> >>> Time for us to bite the bullet I guess. I've just started to work >> >>> on moving the loop to the C-level. That should speed up things >> >>> significantly. I'll let you know when this is ready. >> >>> >> >>> Thanks for the feedback. >> >>> H. >> >>> >> >>>> >> >>>> >> >>>> I would appreciate any comments! >> >>>> >> >>>> Best, >> >>>> Stefanie >> >>>> >> >>>> _______________________________________________ >> >>>> Bioconductor mailing list >> >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >>>> Search the archives: >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >>>> >> >>> >> >>> -- >> >>> Hervé Pagès >> >>> >> >>> Program in Computational Biology >> >>> Division of Public Health Sciences >> >>> Fred Hutchinson Cancer Research Center >> >>> 1100 Fairview Ave. N, M1-B514 >> >>> P.O. Box 19024 >> >>> Seattle, WA 98109-1024 >> >>> >> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> >> >>> Phone: (206) 667-5791 >> >>> Fax: (206) 667-1319 >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > -- _____________________________________________________________ Stefanie Tauber, PhD Center for Integrative Bioinformatics Vienna (CIBIV) Max F. Perutz Laboratories (MFPL) Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 Dr. Bohr Gasse 9 A-1030 Wien, Austria Phone: ++43 +1 / 42772-4030 Fax: ++43 +1 / 42772-4098 email: stefanie.tauber at univie.ac.at
ADD REPLY
0
Entering edit mode
Looks like there's a problem preventing newer GenomicRanges from being published. Doesn't look like it's my fault this time ;) Only option is to grab from svn right now. Michael On Mon, Oct 28, 2013 at 7:00 AM, Stefanie Tauber < stefanie.tauber@univie.ac.at> wrote: > Here is my SessionInfo, > GenomicRanges as well as IRanges should be up-to-date. > > R Under development (unstable) (2013-10-15 r64056) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 > [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 > [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 > [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 stats4_3.1.0 > [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 > > > Best, > Stefanie > > On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: > > Probably need to update GenomicRanges. > > > > > > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < > > stefanie.tauber@univie.ac.at> wrote: > > > >> Hi Hervé, > >> > >> I just wanted to try the new coverage function: > >> > >> # I read in a bam file, human data mapped against the transcriptome > >> x = readGAlignments(human_trans_bam) > >> > >> # This is my GAligments object > >> > x > >> GAlignments with 46765032 alignments and 0 metadata columns: > >> seqnames strand cigar qwidth > >> <rle> <rle> <character> <integer> > >> [1] hg19_ensGene_ENST00000237247 + 75M 75 > >> [2] hg19_ensGene_ENST00000371036 - 75M 75 > >> [3] hg19_ensGene_ENST00000371037 - 75M 75 > >> [4] hg19_ensGene_ENST00000471889 - 75M 75 > >> [5] hg19_ensGene_ENST00000377479 + 75M 75 > >> ... ... ... ... ... > >> [46765028] hg19_ensGene_ENST00000344867 - 75M 75 > >> [46765029] hg19_ensGene_ENST00000400848 - 75M 75 > >> [46765030] hg19_ensGene_ENST00000400848 - 75M 75 > >> [46765031] hg19_ensGene_ENST00000400848 - 75M 75 > >> [46765032] hg19_ensGene_ENST00000400829 - 75M 75 > >> start end width ngap > >> <integer> <integer> <integer> <integer> > >> [1] 1783 1857 75 0 > >> [2] 690 764 75 0 > >> [3] 1632 1706 75 0 > >> [4] 2121 2195 75 0 > >> [5] 391 465 75 0 > >> ... ... ... ... ... > >> [46765028] 280 354 75 0 > >> [46765029] 509 583 75 0 > >> [46765030] 509 583 75 0 > >> [46765031] 509 583 75 0 > >> [46765032] 179 253 75 0 > >> --- > >> seqlengths: > >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 > >> 2580 ... 999 > >> > >> When I want to compute the coverage, I get the following: > >> > cov_x = coverage(x) > >> > >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > >> object '.Ranges.coverage' not found > >> > >> Do you know what I am doing wrong? > >> > >> Thanks, > >> Stefanie > >> > >> > >> > >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > >> > Hi Stephanie, > >> > > >> > Just to let you know that in GenomicRanges 1.14.2, coverage() should > >> > now be much faster on objects with many seqlevels. Should be about > >> > 1000x faster or more if you have 50 000 seqlevels. > >> > > >> > The only case where it's still slow is if many seqlevels are marked as > >> > circular (and have a non-NA seqlength) in seqinfo(x), because for > >> > these seqlevels the coverage vector is still post processed by a > >> > loop in R. Could be a problem if for example people have BAM files > >> > with reads aligned to thousands of bacterial genomes. > >> > > >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become available thru > >> > biocLite() in the next 24 hours or so. The man pages for the > >> "coverage" > >> > methods have been improved, especially in IRanges. > >> > > >> > Cheers, > >> > H. > >> > > >> > > >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > >> >> Thanks for the answer! > >> >> Just wanted to make sure that I don't miss a relevant feature . > >> >> > >> >> Best, > >> >> Stefanie > >> >> > >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org> >> >> <mailto:hpages@fhcrc.org>>: > >> >> > >> >>> Hi Stephanie, > >> >>> > >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > >> >>>> What is the best way to compute the coverage on a ReadGAlignments > >> >>>> object > >> >>>> when I have many, many (about 50 000) seqlevels? > >> >>>> > >> >>>> Just computing coverage(object) is terribly slow due to the huge > >> >>>> amount of seqlevels. > >> >>>> > >> >>>> At the moment, I am subsetting the ReadGAlignments object for each > >> >>>> seqlevel, > >> >>>> I omit the redundant seqlevels and calculate the coverage. > >> >>>> This is fast if you do it for a few seqlevels. > >> >>>> But if one really would like to compute the coverage for each > >> >>>> seqlevel individually, > >> >>>> this really takes a long time.. > >> >>> > >> >>> Yes this is a known weakness in the current implementation of > >> >>> coverage(). It loops over the seqlevels and this loop is currently > >> >>> performed in R, which is slow. Unfortunately this is not the first > >> >>> time someone complains about this. > >> >>> > >> >>> Time for us to bite the bullet I guess. I've just started to work > >> >>> on moving the loop to the C-level. That should speed up things > >> >>> significantly. I'll let you know when this is ready. > >> >>> > >> >>> Thanks for the feedback. > >> >>> H. > >> >>> > >> >>>> > >> >>>> > >> >>>> I would appreciate any comments! > >> >>>> > >> >>>> Best, > >> >>>> Stefanie > >> >>>> > >> >>>> _______________________________________________ > >> >>>> Bioconductor mailing list > >> >>>> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>>> Search the archives: > >> >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>>> > >> >>> > >> >>> -- > >> >>> Hervé Pagès > >> >>> > >> >>> Program in Computational Biology > >> >>> Division of Public Health Sciences > >> >>> Fred Hutchinson Cancer Research Center > >> >>> 1100 Fairview Ave. N, M1-B514 > >> >>> P.O. Box 19024 > >> >>> Seattle, WA 98109-1024 > >> >>> > >> >>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> > >> >>> Phone: (206) 667-5791 > >> >>> Fax: (206) 667-1319 > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > -- > _____________________________________________________________ > > Stefanie Tauber, PhD > > Center for Integrative Bioinformatics Vienna (CIBIV) > Max F. Perutz Laboratories (MFPL) > Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 > Dr. Bohr Gasse 9 > A-1030 Wien, Austria > Phone: ++43 +1 / 42772-4030 > Fax: ++43 +1 / 42772-4098 > email: stefanie.tauber@univie.ac.at > > _____________________________________________________________ > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Stephanie, GenomicRanges 1.14.2 (in BioC release) is available since last Friday. In BioC devel though, other problems got in the way but GenomicRanges 1.15.5 is finally out there (since a couple of hours). Cheers, H. On 10/28/2013 09:24 AM, Michael Lawrence wrote: > Looks like there's a problem preventing newer GenomicRanges from being > published. Doesn't look like it's my fault this time ;) Only option is > to grab from svn right now. > > Michael > > > On Mon, Oct 28, 2013 at 7:00 AM, Stefanie Tauber > <stefanie.tauber at="" univie.ac.at="" <mailto:stefanie.tauber="" at="" univie.ac.at="">> wrote: > > Here is my SessionInfo, > GenomicRanges as well as IRanges should be up-to-date. > > R Under development (unstable) (2013-10-15 r64056) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 > [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 > [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 > [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 > stats4_3.1.0 > [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 > > > Best, > Stefanie > > On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: > > Probably need to update GenomicRanges. > > > > > > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < > > stefanie.tauber at univie.ac.at > <mailto:stefanie.tauber at="" univie.ac.at="">> wrote: > > > >> Hi Herv?, > >> > >> I just wanted to try the new coverage function: > >> > >> # I read in a bam file, human data mapped against the transcriptome > >> x = readGAlignments(human_trans_bam) > >> > >> # This is my GAligments object > >> > x > >> GAlignments with 46765032 alignments and 0 metadata columns: > >> seqnames strand cigar > qwidth > >> <rle> <rle> <character> > <integer> > >> [1] hg19_ensGene_ENST00000237247 + 75M > 75 > >> [2] hg19_ensGene_ENST00000371036 - 75M > 75 > >> [3] hg19_ensGene_ENST00000371037 - 75M > 75 > >> [4] hg19_ensGene_ENST00000471889 - 75M > 75 > >> [5] hg19_ensGene_ENST00000377479 + 75M > 75 > >> ... ... ... ... > ... > >> [46765028] hg19_ensGene_ENST00000344867 - 75M > 75 > >> [46765029] hg19_ensGene_ENST00000400848 - 75M > 75 > >> [46765030] hg19_ensGene_ENST00000400848 - 75M > 75 > >> [46765031] hg19_ensGene_ENST00000400848 - 75M > 75 > >> [46765032] hg19_ensGene_ENST00000400829 - 75M > 75 > >> start end width ngap > >> <integer> <integer> <integer> <integer> > >> [1] 1783 1857 75 0 > >> [2] 690 764 75 0 > >> [3] 1632 1706 75 0 > >> [4] 2121 2195 75 0 > >> [5] 391 465 75 0 > >> ... ... ... ... ... > >> [46765028] 280 354 75 0 > >> [46765029] 509 583 75 0 > >> [46765030] 509 583 75 0 > >> [46765031] 509 583 75 0 > >> [46765032] 179 253 75 0 > >> --- > >> seqlengths: > >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 > >> 2580 ... 999 > >> > >> When I want to compute the coverage, I get the following: > >> > cov_x = coverage(x) > >> > >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : > >> object '.Ranges.coverage' not found > >> > >> Do you know what I am doing wrong? > >> > >> Thanks, > >> Stefanie > >> > >> > >> > >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: > >> > Hi Stephanie, > >> > > >> > Just to let you know that in GenomicRanges 1.14.2, coverage() > should > >> > now be much faster on objects with many seqlevels. Should be about > >> > 1000x faster or more if you have 50 000 seqlevels. > >> > > >> > The only case where it's still slow is if many seqlevels are > marked as > >> > circular (and have a non-NA seqlength) in seqinfo(x), because for > >> > these seqlevels the coverage vector is still post processed by a > >> > loop in R. Could be a problem if for example people have BAM files > >> > with reads aligned to thousands of bacterial genomes. > >> > > >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become > available thru > >> > biocLite() in the next 24 hours or so. The man pages for the > >> "coverage" > >> > methods have been improved, especially in IRanges. > >> > > >> > Cheers, > >> > H. > >> > > >> > > >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: > >> >> Thanks for the answer! > >> >> Just wanted to make sure that I don't miss a relevant feature?. > >> >> > >> >> Best, > >> >> Stefanie > >> >> > >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages at="" fhcrc.org=""> <mailto:hpages at="" fhcrc.org=""> > >> >> <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">>>: > >> >> > >> >>> Hi Stephanie, > >> >>> > >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: > >> >>>> What is the best way to compute the coverage on a > ReadGAlignments > >> >>>> object > >> >>>> when I have many, many (about 50 000) seqlevels? > >> >>>> > >> >>>> Just computing coverage(object) is terribly slow due to the > huge > >> >>>> amount of seqlevels. > >> >>>> > >> >>>> At the moment, I am subsetting the ReadGAlignments object > for each > >> >>>> seqlevel, > >> >>>> I omit the redundant seqlevels and calculate the coverage. > >> >>>> This is fast if you do it for a few seqlevels. > >> >>>> But if one really would like to compute the coverage for each > >> >>>> seqlevel individually, > >> >>>> this really takes a long time.. > >> >>> > >> >>> Yes this is a known weakness in the current implementation of > >> >>> coverage(). It loops over the seqlevels and this loop is > currently > >> >>> performed in R, which is slow. Unfortunately this is not the > first > >> >>> time someone complains about this. > >> >>> > >> >>> Time for us to bite the bullet I guess. I've just started to > work > >> >>> on moving the loop to the C-level. That should speed up things > >> >>> significantly. I'll let you know when this is ready. > >> >>> > >> >>> Thanks for the feedback. > >> >>> H. > >> >>> > >> >>>> > >> >>>> > >> >>>> I would appreciate any comments! > >> >>>> > >> >>>> Best, > >> >>>> Stefanie > >> >>>> > >> >>>> _______________________________________________ > >> >>>> Bioconductor mailing list > >> >>>> Bioconductor at r-project.org > <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> >>>> Search the archives: > >> >>>> > http://news.gmane.org/gmane.science.biology.informatics.conductor > >> >>>> > >> >>> > >> >>> -- > >> >>> Hervé Pagès > >> >>> > >> >>> Program in Computational Biology > >> >>> Division of Public Health Sciences > >> >>> Fred Hutchinson Cancer Research Center > >> >>> 1100 Fairview Ave. N, M1-B514 > >> >>> P.O. Box 19024 > >> >>> Seattle, WA 98109-1024 > >> >>> > >> >>> E-mail: hpages at fhcrc.org <mailto:hpages at="" fhcrc.org=""> > <mailto:hpages at="" fhcrc.org="" <mailto:hpages="" at="" fhcrc.org="">> > >> >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> > >> >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > -- > _____________________________________________________________ > > Stefanie Tauber, PhD > > Center for Integrative Bioinformatics Vienna (CIBIV) > Max F. Perutz Laboratories (MFPL) > Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 > Dr. Bohr Gasse 9 > A-1030 Wien, Austria > Phone: ++43 +1 / 42772-4030 > Fax: ++43 +1 / 42772-4098 > email: stefanie.tauber at univie.ac.at > <mailto:stefanie.tauber at="" univie.ac.at=""> > > _____________________________________________________________ > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Its working, thanks! Best, Stefanie Am 28.10.2013 um 19:12 schrieb Hervé Pagès <hpages@fhcrc.org>: > Hi Stephanie, > > GenomicRanges 1.14.2 (in BioC release) is available since last Friday. > > In BioC devel though, other problems got in the way but GenomicRanges > 1.15.5 is finally out there (since a couple of hours). > > Cheers, > H. > > > On 10/28/2013 09:24 AM, Michael Lawrence wrote: >> Looks like there's a problem preventing newer GenomicRanges from being >> published. Doesn't look like it's my fault this time ;) Only option is >> to grab from svn right now. >> >> Michael >> >> >> On Mon, Oct 28, 2013 at 7:00 AM, Stefanie Tauber >> <stefanie.tauber@univie.ac.at <mailto:stefanie.tauber@univie.ac.at="">> wrote: >> >> Here is my SessionInfo, >> GenomicRanges as well as IRanges should be up-to-date. >> >> R Under development (unstable) (2013-10-15 r64056) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] Rsamtools_1.15.2 Biostrings_2.31.0 GenomicFeatures_1.15.2 >> [4] AnnotationDbi_1.25.2 Biobase_2.23.1 GenomicRanges_1.15.1 >> [7] XVector_0.3.0 IRanges_1.21.5 BiocGenerics_0.9.0 >> >> loaded via a namespace (and not attached): >> [1] biomaRt_2.19.0 bitops_1.0-6 BSgenome_1.31.0 DBI_0.2-7 >> [5] RCurl_1.95-4.1 RSQLite_0.11.4 rtracklayer_1.23.1 >> stats4_3.1.0 >> [9] tools_3.1.0 XML_3.98-1.1 zlibbioc_1.9.0 >> >> >> Best, >> Stefanie >> >> On Mo, 28.10.2013, 14:56, Michael Lawrence wrote: >> > Probably need to update GenomicRanges. >> > >> > >> > On Mon, Oct 28, 2013 at 5:11 AM, Stefanie Tauber < >> > stefanie.tauber@univie.ac.at >> <mailto:stefanie.tauber@univie.ac.at>> wrote: >> > >> >> Hi Hervé, >> >> >> >> I just wanted to try the new coverage function: >> >> >> >> # I read in a bam file, human data mapped against the transcriptome >> >> x = readGAlignments(human_trans_bam) >> >> >> >> # This is my GAligments object >> >> > x >> >> GAlignments with 46765032 alignments and 0 metadata columns: >> >> seqnames strand cigar >> qwidth >> >> <rle> <rle> <character> >> <integer> >> >> [1] hg19_ensGene_ENST00000237247 + 75M >> 75 >> >> [2] hg19_ensGene_ENST00000371036 - 75M >> 75 >> >> [3] hg19_ensGene_ENST00000371037 - 75M >> 75 >> >> [4] hg19_ensGene_ENST00000471889 - 75M >> 75 >> >> [5] hg19_ensGene_ENST00000377479 + 75M >> 75 >> >> ... ... ... ... >> ... >> >> [46765028] hg19_ensGene_ENST00000344867 - 75M >> 75 >> >> [46765029] hg19_ensGene_ENST00000400848 - 75M >> 75 >> >> [46765030] hg19_ensGene_ENST00000400848 - 75M >> 75 >> >> [46765031] hg19_ensGene_ENST00000400848 - 75M >> 75 >> >> [46765032] hg19_ensGene_ENST00000400829 - 75M >> 75 >> >> start end width ngap >> >> <integer> <integer> <integer> <integer> >> >> [1] 1783 1857 75 0 >> >> [2] 690 764 75 0 >> >> [3] 1632 1706 75 0 >> >> [4] 2121 2195 75 0 >> >> [5] 391 465 75 0 >> >> ... ... ... ... ... >> >> [46765028] 280 354 75 0 >> >> [46765029] 509 583 75 0 >> >> [46765030] 509 583 75 0 >> >> [46765031] 509 583 75 0 >> >> [46765032] 179 253 75 0 >> >> --- >> >> seqlengths: >> >> hg19_ensGene_ENST00000237247 ... hg19_ensGene_ENST00000400829 >> >> 2580 ... 999 >> >> >> >> When I want to compute the coverage, I get the following: >> >> > cov_x = coverage(x) >> >> >> >> Error in get(name, envir = asNamespace(pkg), inherits = FALSE) : >> >> object '.Ranges.coverage' not found >> >> >> >> Do you know what I am doing wrong? >> >> >> >> Thanks, >> >> Stefanie >> >> >> >> >> >> >> >> On Fr, 25.10.2013, 11:44, Hervé Pagès wrote: >> >> > Hi Stephanie, >> >> > >> >> > Just to let you know that in GenomicRanges 1.14.2, coverage() >> should >> >> > now be much faster on objects with many seqlevels. Should be about >> >> > 1000x faster or more if you have 50 000 seqlevels. >> >> > >> >> > The only case where it's still slow is if many seqlevels are >> marked as >> >> > circular (and have a non-NA seqlength) in seqinfo(x), because for >> >> > these seqlevels the coverage vector is still post processed by a >> >> > loop in R. Could be a problem if for example people have BAM files >> >> > with reads aligned to thousands of bacterial genomes. >> >> > >> >> > GenomicRanges 1.14.2 and IRanges 1.20.1 should become >> available thru >> >> > biocLite() in the next 24 hours or so. The man pages for the >> >> "coverage" >> >> > methods have been improved, especially in IRanges. >> >> > >> >> > Cheers, >> >> > H. >> >> > >> >> > >> >> > On 10/18/2013 12:19 AM, Stefanie Tauber wrote: >> >> >> Thanks for the answer! >> >> >> Just wanted to make sure that I don't miss a relevant feature . >> >> >> >> >> >> Best, >> >> >> Stefanie >> >> >> >> >> >> Am 17.10.2013 um 22:07 schrieb Hervé Pagès <hpages@fhcrc.org>> <mailto:hpages@fhcrc.org> >> >> >> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">>>: >> >> >> >> >> >>> Hi Stephanie, >> >> >>> >> >> >>> On 10/17/2013 07:37 AM, Stefanie Tauber wrote: >> >> >>>> What is the best way to compute the coverage on a >> ReadGAlignments >> >> >>>> object >> >> >>>> when I have many, many (about 50 000) seqlevels? >> >> >>>> >> >> >>>> Just computing coverage(object) is terribly slow due to the >> huge >> >> >>>> amount of seqlevels. >> >> >>>> >> >> >>>> At the moment, I am subsetting the ReadGAlignments object >> for each >> >> >>>> seqlevel, >> >> >>>> I omit the redundant seqlevels and calculate the coverage. >> >> >>>> This is fast if you do it for a few seqlevels. >> >> >>>> But if one really would like to compute the coverage for each >> >> >>>> seqlevel individually, >> >> >>>> this really takes a long time.. >> >> >>> >> >> >>> Yes this is a known weakness in the current implementation of >> >> >>> coverage(). It loops over the seqlevels and this loop is >> currently >> >> >>> performed in R, which is slow. Unfortunately this is not the >> first >> >> >>> time someone complains about this. >> >> >>> >> >> >>> Time for us to bite the bullet I guess. I've just started to >> work >> >> >>> on moving the loop to the C-level. That should speed up things >> >> >>> significantly. I'll let you know when this is ready. >> >> >>> >> >> >>> Thanks for the feedback. >> >> >>> H. >> >> >>> >> >> >>>> >> >> >>>> >> >> >>>> I would appreciate any comments! >> >> >>>> >> >> >>>> Best, >> >> >>>> Stefanie >> >> >>>> >> >> >>>> _______________________________________________ >> >> >>>> Bioconductor mailing list >> >> >>>> Bioconductor@r-project.org >> <mailto:bioconductor@r-project.org> >> <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> >> >> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> >>>> Search the archives: >> >> >>>> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >>>> >> >> >>> >> >> >>> -- >> >> >>> Hervé Pagès >> >> >>> >> >> >>> Program in Computational Biology >> >> >>> Division of Public Health Sciences >> >> >>> Fred Hutchinson Cancer Research Center >> >> >>> 1100 Fairview Ave. N, M1-B514 >> >> >>> P.O. Box 19024 >> >> >>> Seattle, WA 98109-1024 >> >> >>> >> >> >>> E-mail: hpages@fhcrc.org <mailto:hpages@fhcrc.org> >> <mailto:hpages@fhcrc.org <mailto:hpages@fhcrc.org="">> >> >> >>> Phone: (206) 667-5791 <tel:%28206%29%20667-5791> >> >> >>> Fax: (206) 667-1319 <tel:%28206%29%20667-1319> >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> >> >> -- > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6