Entering edit mode
delhomme@embl.de
★
1.2k
@delhommeemblde-3232
Last seen 10.6 years ago
Dear Mark,
I'm cc'ing the Bioconductor mailing list in case it helps others.
I've implemented the support for variable read length (version 1.3.7
of easyRNASeq, available in SVN now and in a couple of days from
Bioc). Thanks for your reproducible use case. Below is the code I ran:
library(easyRNASeq)
counts <- easyRNASeq(organism="Dmelanogaster",
annotationMethod="gtf",
annotationFile="Drosophila_melanogaster.BDGP5.67.gtf",
gapped=TRUE, count="genes",
summarization="geneModels",
pattern=".subread_s.bam$",
filesDirectory=".")
It took 4 minutes on my machine. Having variable read length does
affect the performance but I'm working on that.
In the command line, note that I removed the argument 'format' as bam
is now the default. Since you're using 'bam', the arguments chr.sizes
is not necessary either as the chromosome lengths are extracted from
the BAM file. Using the chr.map is not necessary for Dmelanogaster as
easyRNASeq should correctly convert the chromosome names. To list
which organism easyRNASeq support that conversion for, I have added a
method: knownOrganisms that lists them. If your organism is not part
of that list but the chromosome/scaffold names are identical between
the BAM file and the annotation file, it will work smoothly too.
Thanks again,
Cheers,
Nico
---------------------------------------------------------------
Nicolas Delhomme
Genome Biology Computational Support
European Molecular Biology Laboratory
Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------
On Jun 13, 2012, at 2:21 PM, Mark Robinson wrote:
> Dear Nico,
>
> I get this error (see below for full details):
>
> Error in fetchCoverage(obj, format = format, filename = file, filter
= filter, :
> The file ./SRR031714.subread_s.bam contains reads of different
sizes: 74, 70 . We cannot deal with such data at the moment. Please
contact the authors to add this functionality.
>
> With read trimming and so on, won't this be a common occurrence?
Your error message says to ask you to add this functionality, so I
ask.
>
> Best,
> Mark
>
>
>> # setup
>> gtf <- "Drosophila_melanogaster.BDGP5.67.gtf"
>> fn <- dir(,".subread_s.bam$") # file names
>>
>> nm <- names( seqlengths(Dmelanogaster) )
>> chr.map <- data.frame(from=nm,to=gsub("chr","",nm))
>> #chr.map <- data.frame(from=nm,to=gsub("chr","",nm))
>>
>> counts <- easyRNASeq(organism="Dmelanogaster",
chr.sizes=as.list(seqlengths(Dmelanogaster)),
> + readLength=74L, annotationMethod="gtf",
annotationFile=gtf, format="bam", gapped=TRUE,
> + count="exons", filenames=fn,
filesDirectory=".", chr.map=chr.map)
> Checking arguments...
> Fetching annotations...
> Read 308180 records
> Summarizing counts...
> Processing SRR031708.subread_s.bam
> Processing SRR031709.subread_s.bam
> Processing SRR031710.subread_s.bam
> Processing SRR031711.subread_s.bam
> Processing SRR031712.subread_s.bam
> Processing SRR031713.subread_s.bam
> Processing SRR031714.subread_s.bam
> Error in fetchCoverage(obj, format = format, filename = file, filter
= filter, :
> The file ./SRR031714.subread_s.bam contains reads of different
sizes: 74, 70 . We cannot deal with such data at the moment. Please
contact the authors to add this functionality.
> In addition: There were 23 warnings (use warnings() to see them)
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.8.1
> [2] AnnotationDbi_1.18.1
> [3] BSgenome.Dmelanogaster.UCSC.dm3_1.3.17
> [4] easyRNASeq_1.2.3
> [5] ShortRead_1.14.4
> [6] latticeExtra_0.6-19
> [7] RColorBrewer_1.0-5
> [8] lattice_0.20-6
> [9] Rsamtools_1.8.5
> [10] DESeq_1.8.3
> [11] locfit_1.5-8
> [12] BSgenome_1.24.0
> [13] GenomicRanges_1.8.6
> [14] Biostrings_2.24.1
> [15] IRanges_1.14.3
> [16] edgeR_2.6.7
> [17] limma_3.12.1
> [18] biomaRt_2.12.0
> [19] Biobase_2.16.0
> [20] genomeIntervals_1.12.0
> [21] BiocGenerics_0.2.0
> [22] intervals_0.13.3
>
> loaded via a namespace (and not attached):
> [1] annotate_1.34.0 bitops_1.0-4.1 DBI_0.2-5
genefilter_1.38.0
> [5] geneplotter_1.34.0 grid_2.15.0 hwriter_1.3
RCurl_1.91-1
> [9] RSQLite_0.11.1 rtracklayer_1.16.1 splines_2.15.0
stats4_2.15.0
> [13] survival_2.36-14 tools_2.15.0 XML_3.9-4
xtable_1.7-0
> [17] zlibbioc_1.2.0
>
>
>
>
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics
> Institute of Molecular Life Sciences
> University of Zurich
> Winterthurerstrasse 190
> 8057 Zurich
> Switzerland
>
> v: +41 44 635 4848
> f: +41 44 635 6898
> e: mark.robinson at imls.uzh.ch
> o: Y11-J-16
> w: http://tiny.cc/mrobin
>