minimal number of features tested in edgeR
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Hi, I have a question regarding the minimal number of genes that we can test in an analysis with edgeR. Let me explain, in a study, edgeR have been used for testing the differential expression of three viruses between two conditions, without considering the counts on other features. That is, the data frame d$counts has only three lines (and 4 columns, as there is two replicates per condition). The library sizes, however, correspond to the total number of tags aligned both on these viruses and on the genes of the host organism. It seems inappropriate to me, as I don't understand how it would be possible to estimate reliably the dispersion from only three features, but maybe I'm wrong... May I have your opinion? For you, what is the minimal number of features that we can test using edgeR? Thank you by advance for your help. Best regards, St??phanie -- output of sessionInfo(): sessionInfo() R version 2.15.0 (2012-03-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] edgeR_2.6.2 limma_3.12.0 loaded via a namespace (and not attached): [1] annotate_1.34.0 AnnotationDbi_1.18.0 Biobase_2.16.0 [4] BiocGenerics_0.2.0 DBI_0.2-5 DESeq_1.8.2 [7] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 [10] IRanges_1.14.3 RColorBrewer_1.0-5 RSQLite_0.11.1 [13] splines_2.15.0 stats4_2.15.0 survival_2.36-14 [16] xtable_1.7-0 -- Sent via the guest posting facility at bioconductor.org.
Organism edgeR Organism edgeR • 952 views
ADD COMMENT
0
Entering edit mode
Mark Robinson ▴ 880
@mark-robinson-4908
Last seen 6.1 years ago
Hi Stephanie, In theory, the minimal number of features you can test is 1. From your three rows (2 groups of 2 replicates), you have 6 degrees of freedom to estimate a common dispersion, as opposed to 2 with just one feature. This should "help" and I would consider that an improvement. Assuming some other things fall into place (e.g. it's reasonable to assume, at least to a first-order approximation, that features have the same dispersion), then this should be ok. Assuming they are representative, you could also consider other using other features (that you've presumably filtered?) for just the purpose of estimating dispersion and only test the 3 features of interest. This only helps if they are representative, but gets a bit hard to defend. Anyways, these are just opinions and possibilities. Best, Mark On 25.10.2012, at 11:23, Stephanie [guest] wrote: > > Hi, > > I have a question regarding the minimal number of genes that we can test in an analysis with edgeR. Let me explain, in a study, edgeR have been used for testing the differential expression of three viruses between two conditions, without considering the counts on other features. That is, the data frame d$counts has only three lines (and 4 columns, as there is two replicates per condition). The library sizes, however, correspond to the total number of tags aligned both on these viruses and on the genes of the host organism. It seems inappropriate to me, as I don't understand how it would be possible to estimate reliably the dispersion from only three features, but maybe I'm wrong... May I have your opinion? > For you, what is the minimal number of features that we can test using edgeR? > > Thank you by advance for your help. > > Best regards, > > St??phanie > > -- output of sessionInfo(): > > sessionInfo() > R version 2.15.0 (2012-03-30) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C > [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 > [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] edgeR_2.6.2 limma_3.12.0 > > loaded via a namespace (and not attached): > [1] annotate_1.34.0 AnnotationDbi_1.18.0 Biobase_2.16.0 > [4] BiocGenerics_0.2.0 DBI_0.2-5 DESeq_1.8.2 > [7] genefilter_1.38.0 geneplotter_1.34.0 grid_2.15.0 > [10] IRanges_1.14.3 RColorBrewer_1.0-5 RSQLite_0.11.1 > [13] splines_2.15.0 stats4_2.15.0 survival_2.36-14 > [16] xtable_1.7-0 > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ---------- Prof. Dr. Mark Robinson Bioinformatics Institute of Molecular Life Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland v: +41 44 635 4848 f: +41 44 635 6898 e: mark.robinson at imls.uzh.ch o: Y11-J-16 w: http://tiny.cc/mrobin ---------- http://www.fgcz.ch/Bioconductor2012
ADD COMMENT

Login before adding your answer.

Traffic: 793 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6