DESeq2: likelihood ratio test
1
1
Entering edit mode
Quynh Tran ▴ 30
@quynh-tran-6671
Last seen 10.4 years ago
Hi, I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code: neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data, colData = neuron.mapping.data, design = ~ Gender+Disease) neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) The LRT test the full vs reduced model. So, the null model is reduce and the alternative model is full. When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only??? Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1. Why is this the case? Thanks, Quynh
• 11k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 11 hours ago
United States
hi Quynh, It's helpful to include the code and output of sessionInfo() for more detailed answers. more answers inline: On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at="" memphis.edu=""> wrote: > Hi, > > I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code: > > neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data, > colData = neuron.mapping.data, > design = ~ Gender+Disease) > neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) > > The LRT test the full vs reduced model. So, the null model is reduce and the alternative model is full. When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only??? > Yes, except the null includes genes affected by gender or not at all. So we typically say "testing for the effect of disease, controlling for gender" > Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1. Why is this the case? > In the help page for ?results, we have: "For analyses using the likelihood ratio test (using nbinomLRT), the p-values are determined solely by the difference in deviance between the full and reduced model formula. A log2 fold change is included, which can be controlled using the name argument, or by default this will be the estimated coefficient for the last element of resultsNames(object)." It should also specify this information when you show the results object if you are using version >= 1.4, e.g.: log2 fold change: condition 2 vs 1 LRT p-value: '~ condition' vs '~ 1' ... If you want to test comparisons of pairs of levels of disease, you should be using the Wald test (the standard DESeq() workflow), instead of the LRT. The LRT is for testing all levels of disease at once. When you use the LRT, using the 'name' argument only changes the columns of the results which have to do with the LFC's. For the current release of DESeq2 (v1.4), when you use the 'contrast' argument, the p-values are replaced with Wald test p-values. In the development branch, I am working on having more specific control of whether or not to replace the LRT p-values with Wald p-values. For now, you can build whatever results table you like by combining the LFC columns for all contrasts with the LRT statistic, p-values and adjusted p-value columns. Mike > Thanks, > Quynh > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Thank you, Mike. I have another question. I remember you mentioned that LRT results is similar to the union of contrast Wald tests. I have 80 genes significant by LRT, and 145 genes significant when combining all 3 comparisons. Are these numbers reasonable? dpsc.dds <- DESeqDataSetFromMatrix(countData = dpsc.counts.data, colData = dpsc.mapping.data, design = ~ Gender+Disease) dpsc.dds.LRT <- DESeq(dpsc.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) dpsc.res.LRT <- results(dpsc.dds.LRT) dpsc.res <- results(dpsc.dds) dpsc.res$padj.15qvsCon <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "Control"))$padj dpsc.res$padj.15qvsAS <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "AS.Deletion"))$padj My session info() ## R version 3.1.0 (2014-04-10) ## Platform: x86_64-apple-darwin10.8.0 (64-bit) ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2 ## [4] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.10 ## [7] BiocGenerics_0.10.0 ## ## loaded via a namespace (and not attached): ## [1] annotate_1.42.1 AnnotationDbi_1.26.0 Biobase_2.24.0 ## [4] DBI_0.2-7 digest_0.6.4 evaluate_0.5.5 ## [7] formatR_0.10 genefilter_1.46.1 geneplotter_1.42.0 ## [10] grid_3.1.0 htmltools_0.2.4 knitr_1.6 ## [13] lattice_0.20-29 locfit_1.5-9.1 RColorBrewer_1.0-5 ## [16] rmarkdown_0.2.49 RSQLite_0.11.4 splines_3.1.0 ## [19] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 ## [22] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 ## [25] XVector_0.4.0 On Aug 25, 2014, at 12:07 PM, Michael Love wrote: > hi Quynh, > > It's helpful to include the code and output of sessionInfo() for more > detailed answers. > > more answers inline: > > > > > On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at="" memphis.edu=""> wrote: >> Hi, >> >> I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code: >> >> neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data, >> colData = neuron.mapping.data, >> design = ~ Gender+Disease) >> neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) >> >> The LRT test the full vs reduced model. So, the null model is reduce and the alternative model is full. When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only??? >> > > Yes, except the null includes genes affected by gender or not at all. > > So we typically say "testing for the effect of disease, controlling for gender" > >> Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1. Why is this the case? >> > > In the help page for ?results, we have: > > "For analyses using the likelihood ratio test (using nbinomLRT), the > p-values are determined solely by the difference in deviance between > the full and reduced model formula. A log2 fold change is included, > which can be controlled using the name argument, or by default this > will be the estimated coefficient for the last element of > resultsNames(object)." > > It should also specify this information when you show the results > object if you are using version >= 1.4, e.g.: > > log2 fold change: condition 2 vs 1 > LRT p-value: '~ condition' vs '~ 1' > ... > > If you want to test comparisons of pairs of levels of disease, you > should be using the Wald test (the standard DESeq() workflow), instead > of the LRT. The LRT is for testing all levels of disease at once. > > When you use the LRT, using the 'name' argument only changes the > columns of the results which have to do with the LFC's. For the > current release of DESeq2 (v1.4), when you use the 'contrast' > argument, the p-values are replaced with Wald test p-values. In the > development branch, I am working on having more specific control of > whether or not to replace the LRT p-values with Wald p-values. For > now, you can build whatever results table you like by combining the > LFC columns for all contrasts with the LRT statistic, p-values and > adjusted p-value columns. > > Mike > >> Thanks, >> Quynh >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thank you, Mike. I have another question. I remember you mentioned that LRT results is similar to the union of contrast Wald tests. I have 80 genes significant by LRT, and 145 genes significant when combining all 3 comparisons. Are these numbers reasonable? dpsc.dds <- DESeqDataSetFromMatrix(countData = dpsc.counts.data, colData = dpsc.mapping.data, design = ~ Gender+Disease) dpsc.dds.LRT <- DESeq(dpsc.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) dpsc.res.LRT <- results(dpsc.dds.LRT) dpsc.res <- results(dpsc.dds) dpsc.res$padj.15qvsCon <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "Control"))$padj dpsc.res$padj.15qvsAS <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "AS.Deletion"))$padj My session info() ## R version 3.1.0 (2014-04-10) ## Platform: x86_64-apple-darwin10.8.0 (64-bit) ## ## locale: ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2 ## [4] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.10 ## [7] BiocGenerics_0.10.0 ## ## loaded via a namespace (and not attached): ## [1] annotate_1.42.1 AnnotationDbi_1.26.0 Biobase_2.24.0 ## [4] DBI_0.2-7 digest_0.6.4 evaluate_0.5.5 ## [7] formatR_0.10 genefilter_1.46.1 geneplotter_1.42.0 ## [10] grid_3.1.0 htmltools_0.2.4 knitr_1.6 ## [13] lattice_0.20-29 locfit_1.5-9.1 RColorBrewer_1.0-5 ## [16] rmarkdown_0.2.49 RSQLite_0.11.4 splines_3.1.0 ## [19] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 ## [22] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 ## [25] XVector_0.4.0 ___________________________ Nhu Quynh T. Tran, PhD Assistant Professor of Preventive Medicine University of Tennessee Health Science Center 66 N. Pauline, Suite 633 Memphis, TN 38015 Phone: 901-448-1361 qtran1 at uthsc.edu<mailto:qtran1 at="" uthsc.edu=""> On Aug 25, 2014, at 12:07 PM, Michael Love wrote: hi Quynh, It's helpful to include the code and output of sessionInfo() for more detailed answers. more answers inline: On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at="" memphis.edu<mailto:qtran1="" at="" memphis.edu="">> wrote: Hi, I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code: neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data, colData = neuron.mapping.data, design = ~ Gender+Disease) neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) The LRT test the full vs reduced model. So, the null model is reduce and the alternative model is full. When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only??? Yes, except the null includes genes affected by gender or not at all. So we typically say "testing for the effect of disease, controlling for gender" Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1. Why is this the case? In the help page for ?results, we have: "For analyses using the likelihood ratio test (using nbinomLRT), the p-values are determined solely by the difference in deviance between the full and reduced model formula. A log2 fold change is included, which can be controlled using the name argument, or by default this will be the estimated coefficient for the last element of resultsNames(object)." It should also specify this information when you show the results object if you are using version >= 1.4, e.g.: log2 fold change: condition 2 vs 1 LRT p-value: '~ condition' vs '~ 1' ... If you want to test comparisons of pairs of levels of disease, you should be using the Wald test (the standard DESeq() workflow), instead of the LRT. The LRT is for testing all levels of disease at once. When you use the LRT, using the 'name' argument only changes the columns of the results which have to do with the LFC's. For the current release of DESeq2 (v1.4), when you use the 'contrast' argument, the p-values are replaced with Wald test p-values. In the development branch, I am working on having more specific control of whether or not to replace the LRT p-values with Wald p-values. For now, you can build whatever results table you like by combining the LFC columns for all contrasts with the LRT statistic, p-values and adjusted p-value columns. Mike Thanks, Quynh _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org<mailto:bioconductor at="" r-project.org=""> https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Quynh, On Aug 26, 2014 5:48 PM, "Tran, Nhu Quynh T" <qtran1 at="" uthsc.edu=""> wrote: > > Thank you, Mike. > > I have another question. I remember you mentioned that LRT results is similar to the union of contrast Wald tests. Not exactly. While the LRT is a test of significance for differences of any level of the factor, I would not expect it to be exactly equal to the union of sets of genes using Wald tests. Use the LRT if you want to describe the set of genes from any level of disease. Use the Wald tests if you want to describe the set of genes from pairs of levels. > I have 80 genes significant by LRT, and 145 genes significant when combining all 3 comparisons. Are these numbers reasonable? > > dpsc.dds <- DESeqDataSetFromMatrix(countData = dpsc.counts.data, > colData = dpsc.mapping.data, > design = ~ Gender+Disease) > > dpsc.dds.LRT <- DESeq(dpsc.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) > dpsc.res.LRT <- results(dpsc.dds.LRT) > > dpsc.res <- results(dpsc.dds) > dpsc.res$padj.15qvsCon <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "Control"))$padj > dpsc.res$padj.15qvsAS <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "AS.Deletion"))$padj > > My session info() > > ## R version 3.1.0 (2014-04-10) > ## Platform: x86_64-apple-darwin10.8.0 (64-bit) > ## > ## locale: > ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > ## > ## attached base packages: > ## [1] parallel stats graphics grDevices utils datasets methods > ## [8] base > ## > ## other attached packages: > ## [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2 > ## [4] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.10 > ## [7] BiocGenerics_0.10.0 > ## > ## loaded via a namespace (and not attached): > ## [1] annotate_1.42.1 AnnotationDbi_1.26.0 Biobase_2.24.0 > ## [4] DBI_0.2-7 digest_0.6.4 evaluate_0.5.5 > ## [7] formatR_0.10 genefilter_1.46.1 geneplotter_1.42.0 > ## [10] grid_3.1.0 htmltools_0.2.4 knitr_1.6 > ## [13] lattice_0.20-29 locfit_1.5-9.1 RColorBrewer_1.0-5 > ## [16] rmarkdown_0.2.49 RSQLite_0.11.4 splines_3.1.0 > ## [19] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 > ## [22] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 > ## [25] XVector_0.4.0 > > > > ___________________________ > Nhu Quynh T. Tran, PhD > Assistant Professor of Preventive Medicine > University of Tennessee Health Science Center > 66 N. Pauline, Suite 633 > Memphis, TN 38015 > Phone: 901-448-1361 > qtran1 at uthsc.edu > > > > > On Aug 25, 2014, at 12:07 PM, Michael Love wrote: > >> hi Quynh, >> >> It's helpful to include the code and output of sessionInfo() for more >> detailed answers. >> >> more answers inline: >> >> >> >> >> On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at="" memphis.edu=""> wrote: >>> >>> Hi, >>> >>> >>> I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code: >>> >>> >>> neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data, >>> >>> colData = neuron.mapping.data, >>> >>> design = ~ Gender+Disease) >>> >>> neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) >>> >>> >>> The LRT test the full vs reduced model. So, the null model is reduce and the alternative model is full. When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only??? >>> >>> >> >> Yes, except the null includes genes affected by gender or not at all. >> >> So we typically say "testing for the effect of disease, controlling for gender" >> >>> Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1. Why is this the case? >>> >>> >> >> In the help page for ?results, we have: >> >> "For analyses using the likelihood ratio test (using nbinomLRT), the >> p-values are determined solely by the difference in deviance between >> the full and reduced model formula. A log2 fold change is included, >> which can be controlled using the name argument, or by default this >> will be the estimated coefficient for the last element of >> resultsNames(object)." >> >> It should also specify this information when you show the results >> object if you are using version >= 1.4, e.g.: >> >> log2 fold change: condition 2 vs 1 >> LRT p-value: '~ condition' vs '~ 1' >> ... >> >> If you want to test comparisons of pairs of levels of disease, you >> should be using the Wald test (the standard DESeq() workflow), instead >> of the LRT. The LRT is for testing all levels of disease at once. >> >> When you use the LRT, using the 'name' argument only changes the >> columns of the results which have to do with the LFC's. For the >> current release of DESeq2 (v1.4), when you use the 'contrast' >> argument, the p-values are replaced with Wald test p-values. In the >> development branch, I am working on having more specific control of >> whether or not to replace the LRT p-values with Wald p-values. For >> now, you can build whatever results table you like by combining the >> LFC columns for all contrasts with the LRT statistic, p-values and >> adjusted p-value columns. >> >> Mike >> >>> Thanks, >>> >>> Quynh >>> >>> _______________________________________________ >>> >>> Bioconductor mailing list >>> >>> Bioconductor at r-project.org >>> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thank you, Mike! Quynh On Aug 26, 2014, at 11:22 AM, Michael Love wrote: > Hi Quynh, > > On Aug 26, 2014 5:48 PM, "Tran, Nhu Quynh T" <qtran1 at="" uthsc.edu=""> wrote: > > > > Thank you, Mike. > > > > I have another question. I remember you mentioned that LRT results is similar to the union of contrast Wald tests. > > Not exactly. > > While the LRT is a test of significance for differences of any level of the factor, I would not expect it to be exactly equal to the union of sets of genes using Wald tests. > > Use the LRT if you want to describe the set of genes from any level of disease. Use the Wald tests if you want to describe the set of genes from pairs of levels. > > > I have 80 genes significant by LRT, and 145 genes significant when combining all 3 comparisons. Are these numbers reasonable? > > > > dpsc.dds <- DESeqDataSetFromMatrix(countData = dpsc.counts.data, > > colData = dpsc.mapping.data, > > design = ~ Gender+Disease) > > > > dpsc.dds.LRT <- DESeq(dpsc.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) > > dpsc.res.LRT <- results(dpsc.dds.LRT) > > > > dpsc.res <- results(dpsc.dds) > > dpsc.res$padj.15qvsCon <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "Control"))$padj > > dpsc.res$padj.15qvsAS <- results(dpsc.dds, contrast=c("Disease","15q.Duplication", "AS.Deletion"))$padj > > > > My session info() > > > > ## R version 3.1.0 (2014-04-10) > > ## Platform: x86_64-apple-darwin10.8.0 (64-bit) > > ## > > ## locale: > > ## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > ## > > ## attached base packages: > > ## [1] parallel stats graphics grDevices utils datasets methods > > ## [8] base > > ## > > ## other attached packages: > > ## [1] DESeq2_1.4.5 RcppArmadillo_0.4.320.0 Rcpp_0.11.2 > > ## [4] GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 IRanges_1.22.10 > > ## [7] BiocGenerics_0.10.0 > > ## > > ## loaded via a namespace (and not attached): > > ## [1] annotate_1.42.1 AnnotationDbi_1.26.0 Biobase_2.24.0 > > ## [4] DBI_0.2-7 digest_0.6.4 evaluate_0.5.5 > > ## [7] formatR_0.10 genefilter_1.46.1 geneplotter_1.42.0 > > ## [10] grid_3.1.0 htmltools_0.2.4 knitr_1.6 > > ## [13] lattice_0.20-29 locfit_1.5-9.1 RColorBrewer_1.0-5 > > ## [16] rmarkdown_0.2.49 RSQLite_0.11.4 splines_3.1.0 > > ## [19] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 > > ## [22] tools_3.1.0 XML_3.98-1.1 xtable_1.7-3 > > ## [25] XVector_0.4.0 > > > > > > > > ___________________________ > > Nhu Quynh T. Tran, PhD > > Assistant Professor of Preventive Medicine > > University of Tennessee Health Science Center > > 66 N. Pauline, Suite 633 > > Memphis, TN 38015 > > Phone: 901-448-1361 > > qtran1 at uthsc.edu > > > > > > > > > > On Aug 25, 2014, at 12:07 PM, Michael Love wrote: > > > >> hi Quynh, > >> > >> It's helpful to include the code and output of sessionInfo() for more > >> detailed answers. > >> > >> more answers inline: > >> > >> > >> > >> > >> On Mon, Aug 25, 2014 at 8:24 AM, Quynh Tran <qtran1 at="" memphis.edu=""> wrote: > >>> > >>> Hi, > >>> > >>> > >>> I just want to make sure that I understand the LRT correctly. My purpose is to test if Gender is a confounder in gene expressions across diseases (control, disease1, disease2). I have this code: > >>> > >>> > >>> neu.dds <- DESeqDataSetFromMatrix(countData = neuron.counts.data, > >>> > >>> colData = neuron.mapping.data, > >>> > >>> design = ~ Gender+Disease) > >>> > >>> neu.dds.LRT <- DESeq(neu.dds,betaPrior=FALSE, test="LRT", full=~Gender+Disease, reduced=~Gender) > >>> > >>> > >>> The LRT test the full vs reduced model. So, the null model is reduce and the alternative model is full. When a gene has a p-value < 0.05, we conclude the gene expression changes at some disease in the presence of gender, while others only affected by Gender only??? > >>> > >>> > >> > >> Yes, except the null includes genes affected by gender or not at all. > >> > >> So we typically say "testing for the effect of disease, controlling for gender" > >> > >>> Also, since my disease has 3 levels, I noticed that the p-values for disease2 vs control are the same as for disease1 vs control for the LRT, but not the same for the disease2 vs disease1. Specifically, I have 80 genes with padj< 0.05 for D1 vs Con and D2 vs Con, but have 139 genes with padj <0.05 for D2 vs D1. Why is this the case? > >>> > >>> > >> > >> In the help page for ?results, we have: > >> > >> "For analyses using the likelihood ratio test (using nbinomLRT), the > >> p-values are determined solely by the difference in deviance between > >> the full and reduced model formula. A log2 fold change is included, > >> which can be controlled using the name argument, or by default this > >> will be the estimated coefficient for the last element of > >> resultsNames(object)." > >> > >> It should also specify this information when you show the results > >> object if you are using version >= 1.4, e.g.: > >> > >> log2 fold change: condition 2 vs 1 > >> LRT p-value: '~ condition' vs '~ 1' > >> ... > >> > >> If you want to test comparisons of pairs of levels of disease, you > >> should be using the Wald test (the standard DESeq() workflow), instead > >> of the LRT. The LRT is for testing all levels of disease at once. > >> > >> When you use the LRT, using the 'name' argument only changes the > >> columns of the results which have to do with the LFC's. For the > >> current release of DESeq2 (v1.4), when you use the 'contrast' > >> argument, the p-values are replaced with Wald test p-values. In the > >> development branch, I am working on having more specific control of > >> whether or not to replace the LRT p-values with Wald p-values. For > >> now, you can build whatever results table you like by combining the > >> LFC columns for all contrasts with the LRT statistic, p-values and > >> adjusted p-value columns. > >> > >> Mike > >> > >>> Thanks, > >>> > >>> Quynh > >>> > >>> _______________________________________________ > >>> > >>> Bioconductor mailing list > >>> > >>> Bioconductor at r-project.org > >>> > >>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>> > >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 364 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6