error in easyRNASeq
1
0
Entering edit mode
@mayte-suarez-farinas-2068
Last seen 9.6 years ago
United States
Dear All, I am trying to use the easyRNASeq package with 6 RNAseq samples from mouse. Single end, unstranded protocol, 50bp reads. I followed the vignettes and wrote this short code: library("easyRNASeq") library(Rsamtools) library(DESeq) library(edgeR) library(GenomicRanges) library(parallel) library(BSgenome.Mmusculus.UCSC.mm10) BamPath<-'/MyBAMDirectoryBAM/' fls = list.files(path=BamPath,recursive=TRUE, pattern="*all.accepted_hits.bam$", full=FALSE) conditions<-c(A,A,A,B,B,B) CountsGenes_biomart<- easyRNASeq(filesDirectory=BamPath, + filenames=fls, nbCore=5, + organism='Mmusculus', + chr.sizes=as.list(seqlengths(Mmusculus)), + readLength=50L, + annotationMethod="biomaRt", + format="bam", + count=c('genes'), + outputFormat=c("edgeR"), + summarization=c("geneModels"), + conditions=conditions) But I get the following Error: Error in fetchCoverage(obj, format = format, filename = file, filter = filter, : Some of your read coordinates are bigger than the chromosome sizes you provided. Aborting! In addition: Warning messages: 1: In easyRNASeq(filesDirectory = BamPath, filenames = fls, nbCore = 5, : You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. 2: In easyRNASeq(filesDirectory = BamPath, filenames = fls, nbCore = 5, : There are 5478 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? 3: In fetchCoverage(obj, format = format, filename = file, filter = filter, : You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. Can some one shed some light on the msitake I am making here?? sessionInfo is included bellow, Thanks in advance Mayte Suarez-Farinas sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.17 [2] RnaSeqTutorial_0.0.10 [3] BSgenome.Mmusculus.UCSC.mm10_1.3.17 [4] easyRNASeq_1.2.5 [5] ShortRead_1.14.4 [6] latticeExtra_0.6-24 [7] RColorBrewer_1.0-5 [8] lattice_0.20-10 [9] Rsamtools_1.8.6 [10] DESeq_1.8.3 [11] locfit_1.5-8 [12] BSgenome_1.24.0 [13] GenomicRanges_1.8.13 [14] Biostrings_2.24.1 [15] IRanges_1.14.4 [16] edgeR_2.6.12 [17] limma_3.12.3 [18] biomaRt_2.12.0 [19] Biobase_2.16.0 [20] genomeIntervals_1.12.0 [21] BiocGenerics_0.2.0 [22] intervals_0.13.3 [23] BiocInstaller_1.4.9 loaded via a namespace (and not attached): [1] affy_1.34.0 affyio_1.24.0 annotate_1.34.1 [4] AnnotationDbi_1.18.4 bitops_1.0-4.1 DBI_0.2-5 [7] gcrma_2.28.0 genefilter_1.38.0 geneplotter_1.34.0 [10] grid_2.15.1 hwriter_1.3 preprocessCore_1.18.0 [13] RCurl_1.95-1.1 RSQLite_0.11.2 simpleaffy_2.32.0 [16] splines_2.15.1 stats4_2.15.1 survival_2.36-14 [19] tools_2.15.1 XML_3.95-0.1 xtable_1.7-0 [22] zlibbioc_1.2.0 > Mayte Suarez-Farinas Research Assistant Professor, Laboratory of Investigative Dermatology Biostatistician, Center for Clinical and Translational Science The Rockefeller University 1230 York Ave, Box 178, New York, NY, 10065 Phone: +1(212) 327-8213 Fax: +1(212) 327-8232 [[alternative HTML version deleted]]
RNASeq BSgenome BSgenome easyRNASeq RNASeq BSgenome BSgenome easyRNASeq • 1.5k views
ADD COMMENT
0
Entering edit mode
@delhommeemblde-3232
Last seen 10.2 years ago
Dear Mayte, This message tell you what is happening: > Some of your read coordinates are bigger than the chromosome sizes you provided. I.e. some of your reads in your bam file have coordinates that are outside the chromosomal range you define. Moreover your version of easyRNASeq is fairly outdated, 1.2.5 versus 1.4.2 in the repository. If you get that last version and have a look at the vignette, you'll see how this error can be handled more easily now. I'll sketch it below: ## you do not need to load other libraries, ## all the one you mentioned below (from Rsamtools to parallel) are loaded by easyRNASeq library(easyRNASeq) ## if your bam file has a proper header, you do not need to precise the BSgenome library either ## note that your condition vector need to be named BamPath<-'/MyBAMDirectoryBAM/' fls = list.files(path=BamPath,recursive=TRUE, pattern="*all.accepted_hits.bam$", full=FALSE) conditions<-c(A,A,A,B,B,B) names(conditions) <- fls ## count the genes ## the chromosome size is extracted from the bam header. ## it might be that there are discrepancies between the chromosome naming ## between your bam file and the annotation you retrieve from biomaRt. Check out the use case in the new vignette if that occurs. CountsGenes_biomart <- easyRNASeq( filesDirectory=BamPath, filenames=fls, nbCore=5, organism='Mmusculus', ## not necessary if in your bam file header ## chr.sizes=as.list(seqlengths(Mmusculus)), ## not necessary either, will be determined from your read alignment ## readLength=50L, ## it seems to me that you used tophat to align your reads, so ## if it is the case, you'll probably have gapped alignments ## in which case you want to use the following (which I leave commented as it is just a guess from me): ## gapped = TRUE, annotationMethod="biomaRt", format="bam", count=c('genes'), outputFormat=c("edgeR"), summarization=c("geneModels"), conditions=conditions) Hope this helps, Nico --------------------------------------------------------------- Nicolas Delhomme Genome Biology Computational Support European Molecular Biology Laboratory Tel: +49 6221 387 8310 Email: nicolas.delhomme at embl.de Meyerhofstrasse 1 - Postfach 10.2209 69102 Heidelberg, Germany --------------------------------------------------------------- On Nov 8, 2012, at 7:14 PM, Mayte Suarez-Farinas wrote: > Dear All, > > I am trying to use the easyRNASeq package with 6 RNAseq samples from mouse. > Single end, unstranded protocol, 50bp reads. I followed the vignettes and wrote this short code: > > library("easyRNASeq") > library(Rsamtools) > library(DESeq) > library(edgeR) > library(GenomicRanges) > library(parallel) > library(BSgenome.Mmusculus.UCSC.mm10) > > BamPath<-'/MyBAMDirectoryBAM/' > fls = list.files(path=BamPath,recursive=TRUE, pattern="*all.accepted_hits.bam$", full=FALSE) > conditions<-c(A,A,A,B,B,B) > > CountsGenes_biomart<- easyRNASeq(filesDirectory=BamPath, > + filenames=fls, nbCore=5, > + organism='Mmusculus', > + chr.sizes=as.list(seqlengths(Mmusculus)), > + readLength=50L, > + annotationMethod="biomaRt", > + format="bam", > + count=c('genes'), > + outputFormat=c("edgeR"), > + summarization=c("geneModels"), > + conditions=conditions) > > But I get the following Error: > > Error in fetchCoverage(obj, format = format, filename = file, filter = filter, : > Some of your read coordinates are bigger than the chromosome sizes you provided. Aborting! > In addition: Warning messages: > 1: In easyRNASeq(filesDirectory = BamPath, filenames = fls, nbCore = 5, : > You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it. > 2: In easyRNASeq(filesDirectory = BamPath, filenames = fls, nbCore = 5, : > There are 5478 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want? > 3: In fetchCoverage(obj, format = format, filename = file, filter = filter, : > You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it. > > Can some one shed some light on the msitake I am making here?? sessionInfo is included bellow, > > Thanks in advance > > Mayte Suarez-Farinas > > > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] BSgenome.Dmelanogaster.UCSC.dm3_1.3.17 > [2] RnaSeqTutorial_0.0.10 > [3] BSgenome.Mmusculus.UCSC.mm10_1.3.17 > [4] easyRNASeq_1.2.5 > [5] ShortRead_1.14.4 > [6] latticeExtra_0.6-24 > [7] RColorBrewer_1.0-5 > [8] lattice_0.20-10 > [9] Rsamtools_1.8.6 > [10] DESeq_1.8.3 > [11] locfit_1.5-8 > [12] BSgenome_1.24.0 > [13] GenomicRanges_1.8.13 > [14] Biostrings_2.24.1 > [15] IRanges_1.14.4 > [16] edgeR_2.6.12 > [17] limma_3.12.3 > [18] biomaRt_2.12.0 > [19] Biobase_2.16.0 > [20] genomeIntervals_1.12.0 > [21] BiocGenerics_0.2.0 > [22] intervals_0.13.3 > [23] BiocInstaller_1.4.9 > > loaded via a namespace (and not attached): > [1] affy_1.34.0 affyio_1.24.0 annotate_1.34.1 > [4] AnnotationDbi_1.18.4 bitops_1.0-4.1 DBI_0.2-5 > [7] gcrma_2.28.0 genefilter_1.38.0 geneplotter_1.34.0 > [10] grid_2.15.1 hwriter_1.3 preprocessCore_1.18.0 > [13] RCurl_1.95-1.1 RSQLite_0.11.2 simpleaffy_2.32.0 > [16] splines_2.15.1 stats4_2.15.1 survival_2.36-14 > [19] tools_2.15.1 XML_3.95-0.1 xtable_1.7-0 > [22] zlibbioc_1.2.0 >> > > Mayte Suarez-Farinas > Research Assistant Professor, Laboratory of Investigative Dermatology > Biostatistician, Center for Clinical and Translational Science > The Rockefeller University > 1230 York Ave, Box 178, > New York, NY, 10065 > Phone: +1(212) 327-8213 > Fax: +1(212) 327-8232 > > > > > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT

Login before adding your answer.

Traffic: 854 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6