edgeR norm.factors NaN
1
0
Entering edit mode
@davenportcolinmh-hannoverde-5408
Last seen 10.3 years ago
Dear Bioconductors, I have an issue with calculating normalisation factors in edgeR. This has always i.e. on three other datasets worked just fine, which leads me baffled here. To summarise- -NaNs occur independently of the calcNormFactors method -the counts appear ok -no NaNs are present in the counts virusDGE = calcNormFactors(virusDGE, method="TMM") virusDGE = calcNormFactors(virusDGE, method="RLE") virusDGE = calcNormFactors(virusDGE, method="upperquartile") > virusDGE An object of class "DGEList" $samples group lib.size norm.factors counts1 all 17 NaN counts2 all 8 NaN counts3 all 14 NaN counts4 all 4 NaN counts5 all 18218 NaN counts6 all 37146 NaN counts7 all 2579 NaN counts8 all 1027 NaN $counts counts1 counts2 counts3 MuHV1_gp001 0 0 0 MuHV1_gp002 0 0 0 MuHV1_gp003 0 0 0 MuHV1_gp004 0 0 0 MuHV1_gp005 0 0 0 counts4 counts5 counts6 MuHV1_gp001 0 0 1 MuHV1_gp002 0 4 5 MuHV1_gp003 0 13 18 MuHV1_gp004 0 11 2 MuHV1_gp005 0 4 6 counts7 counts8 MuHV1_gp001 0 0 MuHV1_gp002 0 0 MuHV1_gp003 3 0 MuHV1_gp004 3 0 MuHV1_gp005 2 0 is.integer(virusDGE$counts) #TRUE is.na(virusDGE$counts) #(all are FALSE) > sumis.na(virusDGE$counts)) #[1] 0 > sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.4.6 limma_3.10.3 GenomicFeatures_1.6.9 [4] AnnotationDbi_1.16.19 Biobase_2.14.0 GenomicRanges_1.6.7 [7] IRanges_1.12.6 loaded via a namespace (and not attached): [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 [5] RCurl_1.91-1 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.1 [9] XML_3.9-4 zlibbioc_1.0.1 I am using a custom built annotation, i.e. virustxdb=makeTranscriptDb(transcripts, splicings, genes, chrominfo) It seems to have worked fine so far and counted reads per feature reliably, but could this be the problem ? Thanks for your time, Colin Davenport Dr. Colin Davenport Bioinformatician Tümmler Group PFZ S0-7440 Hannover Medical School Germany davenport [dot] colin <at> mh-hannover.de 0049 511532-8733 Genomics software available at http://genomics1.mh-hannover.de [[alternative HTML version deleted]]
Annotation edgeR Annotation edgeR • 1.9k views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 6.1 years ago
HI Colin, I believe its too many zeros. Basically, in the docs it says: ----- If ?refColumn? is unspecified, the library whose upper quartile is closest to the mean upper quartile is used. ----- I think this breaks down with your data. But the major issue you'll need to deal with is that for the first 4 columns of counts, you barely have any! In 'counts4', you have 4 total reads mapped. I've seen early experiments with 10s of thousands of total mapped reads, but <20 is surely a mistake. Are you sure this experiment worked, or that your custom annotation has captured the mappings correctly? Best, Mark On 19.07.2012, at 11:02, <davenport.colin at="" mh-hannover.de=""> <davenport.colin at="" mh-hannover.de=""> wrote: > Dear Bioconductors, > > I have an issue with calculating normalisation factors in edgeR. This has always i.e. on three other datasets worked just fine, which leads me baffled here. > > To summarise- > -NaNs occur independently of the calcNormFactors method > -the counts appear ok > -no NaNs are present in the counts > > > virusDGE = calcNormFactors(virusDGE, method="TMM") > virusDGE = calcNormFactors(virusDGE, method="RLE") > virusDGE = calcNormFactors(virusDGE, method="upperquartile") > > >> virusDGE > An object of class "DGEList" > $samples > group lib.size norm.factors > counts1 all 17 NaN > counts2 all 8 NaN > counts3 all 14 NaN > counts4 all 4 NaN > counts5 all 18218 NaN > counts6 all 37146 NaN > counts7 all 2579 NaN > counts8 all 1027 NaN > > $counts > counts1 counts2 counts3 > MuHV1_gp001 0 0 0 > MuHV1_gp002 0 0 0 > MuHV1_gp003 0 0 0 > MuHV1_gp004 0 0 0 > MuHV1_gp005 0 0 0 > counts4 counts5 counts6 > MuHV1_gp001 0 0 1 > MuHV1_gp002 0 4 5 > MuHV1_gp003 0 13 18 > MuHV1_gp004 0 11 2 > MuHV1_gp005 0 4 6 > counts7 counts8 > MuHV1_gp001 0 0 > MuHV1_gp002 0 0 > MuHV1_gp003 3 0 > MuHV1_gp004 3 0 > MuHV1_gp005 2 0 > > > > is.integer(virusDGE$counts) > #TRUE > is.na(virusDGE$counts) > #(all are FALSE) >> sumis.na(virusDGE$counts)) > #[1] 0 > > >> sessionInfo() > R version 2.14.1 (2011-12-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.4.6 limma_3.10.3 GenomicFeatures_1.6.9 > [4] AnnotationDbi_1.16.19 Biobase_2.14.0 GenomicRanges_1.6.7 > [7] IRanges_1.12.6 > > loaded via a namespace (and not attached): > [1] biomaRt_2.10.0 Biostrings_2.22.0 BSgenome_1.22.0 DBI_0.2-5 > [5] RCurl_1.91-1 RSQLite_0.11.1 rtracklayer_1.14.4 tools_2.14.1 > [9] XML_3.9-4 zlibbioc_1.0.1 > > > > I am using a custom built annotation, i.e. > virustxdb=makeTranscriptDb(transcripts, splicings, genes, chrominfo) > It seems to have worked fine so far and counted reads per feature reliably, but could this be the problem ? > > > Thanks for your time, > > Colin Davenport > > > Dr. Colin Davenport > Bioinformatician > T?mmler Group > PFZ S0-7440 > Hannover Medical School > Germany > davenport [dot] colin <at> mh-hannover.de > 0049 511532-8733 > > Genomics software available at > http://genomics1.mh-hannover.de > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012 http://www.eccb12.org/t5
ADD COMMENT

Login before adding your answer.

Traffic: 734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6