multiple level factors contrast in nbinomLRT, DESeq2.
2
1
Entering edit mode
shao ▴ 100
@shao-6241
Last seen 6.8 years ago
Germany
Dear list, I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. ## make data library(DESeq2)dds <- makeExampleDESeqDataSet(m = 18)colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6)))dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition)dds <- estimateSizeFactors(dds)dds <- estimateDispersions(dds)## LRTestdds <- nbinomLRT(dds, reduced = formula(~ batch ))resultsNames(dds)[1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A"res.1 <- results(dds, name = "condition_B_vs_A")res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") Questions:1.) I would like to get the DE genes between B and A, while controlling for the batch effect. Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") Best, Chunxuan [[alternative HTML version deleted]]
• 4.3k views
ADD COMMENT
0
Entering edit mode
shao ▴ 100
@shao-6241
Last seen 6.8 years ago
Germany
Dear list, I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. ## make data library(DESeq2) dds <- makeExampleDESeqDataSet(m = 18) colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6))) dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) ## LRTest dds <- nbinomLRT(dds, reduced = formula(~ batch )) resultsNames(dds) [1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A" res.1 <- results(dds, name = "condition_B_vs_A") res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") Questions: 1.) I would like to get the DE genes between B and A, while controlling for the batch effect. Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") > sessionInfo()R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel graphics grDevices utils datasets stats grid methods base other attached packages: [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.10 BiocGenerics_0.10.0 [8] magrittr_1.0.1 dplyr_0.2 RColorBrewer_1.0-5 reshape2_1.4 ggplot2_1.0.0 loaded via a namespace (and not attached): [1] annotate_1.42.1 AnnotationDbi_1.26.0 assertthat_0.1 Biobase_2.24.0 colorspace_1.2-4 DBI_0.2-7 digest_0.6.4 genefilter_1.46.1 [9] geneplotter_1.42.0 gtable_0.1.2 lattice_0.20-29 locfit_1.5-9.1 MASS_7.3-33 munsell_0.4.2 plyr_1.8.1 proto_0.3-10 [17] R.methodsS3_1.6.1 R.oo_1.18.0 RSQLite_0.11.4 scales_0.2.4 splines_3.1.0 stats4_3.1.0 stringr_0.6.2 survival_2.37-7 [25] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 XVector_0.4.0 Best, Chunxuan PS, sorry for the previous one, I forgot the session, and have no idea why the format is a mess. [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi On 20/08/14 12:43, sh. chunxuan wrote: > I am not sure the correct way to interpret the nbinomLRT results in > multiple level factors condition. Is there a reason why you decided to use 'nbinomLRT' rather than 'nbinomWaldTest', which is suggested in the vignette for standard use cases? > Here is a toy example. I would to find DE genes after controlling > batch effect in the experiments, in which there are multiple levels. In your case, the LRT allows you to find genes which are affected by the experimental condition in _some_ way, i.e., for which you can reject the null hypothesis "This gene is expressed at the same level in _all_ six conditions; i.e., it is not affected by the experimental condition at all." However, you want to perform specific contrasts: > 1.) I would like to get the DE genes between B and A, while > controlling for the batch effect. Should I take "res.1" or "res.2", or > both are wrong? and what is the correct way to do it? This is much easier with a Wald test. You just run 'nbinomWaldTest' and then ask for results( dds, contrast = c( "condition", "B", "A" ) ) Simon
ADD REPLY
0
Entering edit mode
@mikelove
Last seen 31 minutes ago
United States
Hi Chunxuan, On Aug 20, 2014 12:38 PM, "sh. chunxuan" <hibergo at="" outlook.com=""> wrote: > > > > > Dear list, > I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. > Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. > ## make data library(DESeq2)dds <- makeExampleDESeqDataSet(m = 18)colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6)))dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition)dds <- estimateSizeFactors(dds)dds <- estimateDispersions(dds)## LRTestdds <- nbinomLRT(dds, reduced = formula(~ batch ))resultsNames(dds)[1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A"res.1 <- results(dds, name = "condition_B_vs_A")res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") > Questions:1.) I would like to get the DE genes between B and A, while controlling for the batch effect. You should use the default Wald test for such a comparison, not the LRT. See the vignette for an explanation of the difference between the two tests. So simply use DESeq() and then use: results(dds, contrast = c("condition", "B", "A")) > Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? > 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" This code gives an error because lowercase "b" is not a level. (It should be giving a more understandable error message, not sure why it is not doing so here.) > 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") Check the help for ?results and the examples there. You should simply use: contrast = c("condition", "C", "F") Mike > Best, Chunxuan > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Mike, Simon, Thanks very much for the quick answers. I moved from DESeq, in which like hood ratio test is used for model comparison, so the the idea just sticked to my mind. Later I saw on the other vignette there is a description of the similar situation specifying "which means that we want to test for the effect of treatment (the last factor), controlling for the effect of patients (the first factor)". Thanks again and best, Chunxuan Date: Wed, 20 Aug 2014 14:43:18 +0200 Subject: Re: [BioC] multiple level factors contrast in nbinomLRT, DESeq2. From: michaelisaiahlove@gmail.com To: hibergo at outlook.com CC: bioconductor at r-project.org Hi Chunxuan, On Aug 20, 2014 12:38 PM, "sh. chunxuan" <hibergo at="" outlook.com=""> wrote: > > > > > Dear list, > I am not sure the correct way to interpret the nbinomLRT results in multiple level factors condition. > Here is a toy example. I would to find DE genes after controlling batch effect in the experiments, in which there are multiple levels. > ## make data library(DESeq2)dds <- makeExampleDESeqDataSet(m = 18)colData <- data.frame(row.names = rownames(colData(dds)), sample = colData(dds)$sample, condition = rep(LETTERS[1:6], each = 3), batch = factor(rep(c(1, 1, 2), 6)))dds <- DESeqDataSetFromMatrix(counts(dds), colData = colData, design = ~ batch + condition)dds <- estimateSizeFactors(dds)dds <- estimateDispersions(dds)## LRTestdds <- nbinomLRT(dds, reduced = formula(~ batch ))resultsNames(dds)[1] "Intercept" "batch_2_vs_1" "condition_B_vs_A" "condition_C_vs_A" "condition_D_vs_A" "condition_E_vs_A" "condition_F_vs_A"res.1 <- results(dds, name = "condition_B_vs_A")res.2 <- results(dds, contrast = list("condition_B_vs_A", "batch_2_vs_1") > Questions:1.) I would like to get the DE genes between B and A, while controlling for the batch effect. You should use the default Wald test for such a comparison, not the LRT. See the vignette for an explanation of the difference between the two tests. So simply use DESeq() and then use: results(dds, contrast = c("condition", "B", "A")) > Should I take "res.1" or "res.2", or both are wrong? and what is the correct way to do it? > 2.) Why "res.3 <- results(dds, contrast = c("condition", "A", "b"))" gave error: "Error in normalizeSingleBracketSubscript(j, x) : subscript contains invalid names" This code gives an error because lowercase "b" is not a level. (It should be giving a more understandable error message, not sure why it is not doing so here.) > 3.) In order to get DE genes between conditions not directly listed in the "resultsNames", is the following codes correct? should "batch_2_vs_1" be included int the contrast?##for example, DE between C and F, controlling for batch effect;res.4 <- results(dds, contrast = list("condition_C_vs_A", "condition_F_vs_A") Check the help for ?results and the examples there. You should simply use: contrast = c("condition", "C", "F") Mike > Best, Chunxuan > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6