eayRNASeq with Ensemble GRCh37 help
1
0
Entering edit mode
Aki Hoji ▴ 10
@aki-hoji-6155
Last seen 19 months ago
United States
Hi, I've been trying to generate an output file for DESeq2 by easyRNASeq. An input file is a BAM generated by Tophat2/Bowtie2 with Ensemble GRCh37.72 which was a part of Illumina's iGenome package. I followed the overview and samples of easyRNASeq in a BioC mailing list and fired up a following; testcount<-easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes="auto", readLength=100L, annotationMethod="gtf", annotationFile="Ensemble.gtf", count="exons", outputFormat="DESeq", filenames="4673Bsorted.bam") Then I got this error; Checking arguments... Fetching annotations... Read 2280612 records Error in easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : The number of conditions: 0 did not correspond to the number of samples: 1 In addition: Warning messages: 1: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. 2: In .Method(..., deparse.level = deparse.level) : number of columns of result is not a multiple of vector length (arg 1) 3: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : There are 966272 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? 4: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it. As far as I can tell, I am not really enforcing the UCSC chromosome convention, and chr.sizes could be set to auto since the BAM file is used. I am getting stuck at this point and any help/pointer will be really appreciated. Thanks. AH > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] easyRNASeq_1.6.0 ShortRead_1.18.0 latticeExtra_0.6-26 [4] RColorBrewer_1.0-5 Rsamtools_1.12.4 DESeq_1.12.1 [7] lattice_0.20-23 locfit_1.5-9.1 BSgenome_1.28.0 [10] GenomicRanges_1.12.5 Biostrings_2.28.0 IRanges_1.18.3 [13] edgeR_3.2.4 limma_3.16.7 biomaRt_2.16.0 [16] Biobase_2.20.1 genomeIntervals_1.16.0 BiocGenerics_0.6.0 [19] intervals_0.14.0 BiocInstaller_1.10.3 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.6 bitops_1.0-6 [4] DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 [7] grid_3.0.1 hwriter_1.3 RCurl_1.95-4.1 [10] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 [13] survival_2.37-4 tools_3.0.1 XML_3.95-0.2 [16] xtable_1.7-1 zlibbioc_1.6.0
Annotation Organism easyRNASeq DESeq2 Annotation Organism easyRNASeq DESeq2 • 1.5k views
ADD COMMENT
0
Entering edit mode
@delhommeemblde-3232
Last seen 10.3 years ago
Hej Aki Hoji! You can indeed ignore the warnings. The error is this: > The number of conditions: 0 did not correspond to the number of samples: 1 For using the DESeq output, you need to precise the conditions, see the ?easyRNASeq help page and the easyRNASeq and DESeq vignettes (e.g. vignette("easyRNASeq")) for more details on the arguments and how to use DESeq. Even if you provide a condition, easyRNASeq is bound to fail again as DESeq can't work with a single sample. Finally, note that easyRNASeq as of now only returns a DESeq and not DESeq2 output (i.e. a CountDataSet and not a SummarizedExperiment). This is planned for next release, planned early October. Best, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On 16 Sep 2013, at 20:17, Aki Hoji wrote: > Hi, > > I've been trying to generate an output file for DESeq2 by easyRNASeq. An input file is a BAM generated by Tophat2/Bowtie2 with Ensemble GRCh37.72 which was a part of Illumina's iGenome package. I followed the overview and samples of easyRNASeq in a BioC mailing list and fired up a following; > > testcount<-easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes="auto", readLength=100L, annotationMethod="gtf", annotationFile="Ensemble.gtf", count="exons", outputFormat="DESeq", filenames="4673Bsorted.bam") > > Then I got this error; > > Checking arguments... > Fetching annotations... > Read 2280612 records > Error in easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > The number of conditions: 0 did not correspond to the number of samples: 1 > In addition: Warning messages: > 1: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. > 2: In .Method(..., deparse.level = deparse.level) : > number of columns of result is not a multiple of vector length (arg 1) > 3: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > There are 966272 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? > 4: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto", : > You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it. > > As far as I can tell, I am not really enforcing the UCSC chromosome convention, and chr.sizes could be set to auto since the BAM file is used. I am getting stuck at this point and any help/pointer will be really appreciated. > > Thanks. > > AH > >> sessionInfo() > R version 3.0.1 (2013-05-16) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods base > > other attached packages: > [1] easyRNASeq_1.6.0 ShortRead_1.18.0 latticeExtra_0.6-26 > [4] RColorBrewer_1.0-5 Rsamtools_1.12.4 DESeq_1.12.1 > [7] lattice_0.20-23 locfit_1.5-9.1 BSgenome_1.28.0 > [10] GenomicRanges_1.12.5 Biostrings_2.28.0 IRanges_1.18.3 > [13] edgeR_3.2.4 limma_3.16.7 biomaRt_2.16.0 > [16] Biobase_2.20.1 genomeIntervals_1.16.0 BiocGenerics_0.6.0 > [19] intervals_0.14.0 BiocInstaller_1.10.3 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.6 bitops_1.0-6 > [4] DBI_0.2-7 genefilter_1.42.0 geneplotter_1.38.0 > [7] grid_3.0.1 hwriter_1.3 RCurl_1.95-4.1 > [10] RSQLite_0.11.4 splines_3.0.1 stats4_3.0.1 > [13] survival_2.37-4 tools_3.0.1 XML_3.95-0.2 > [16] xtable_1.7-1 zlibbioc_1.6.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 668 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6