DESeq2 test over all timepoints?
1
0
Entering edit mode
@hickman-rj-richard-5936
Last seen 10.4 years ago
Dear Mike (and other DESeq2 developers/users), I came across this thread regarding the use of the GLM functions in DESeq2 with respect to time series. However, if I try and perform the analysis you describe then I get an error when estimating the dispersions? do you know what is the cause and/or what I could be doing wrong? dds <- DESeqDataSetFromMatrix( countData = countMatrix, colData = colData, design = ~ time + treatment + treatment:time) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) gene-wise dispersion estimates Error in fitDisp(ySEXP = counts(objectNZ), xSEXP = fit$modelMatrix, mu_hatSEXP = fit$mu_hat, : in call to fitDisp, the following arguments contain NA: mu_hatSEXP The matrix of counts and the colData are OK, i think.. This is the colData df used: treatment time s1 treated T1 s2 treated T1 s3 treated T1 s4 untreated T1 s5 untreated T1 s6 untreated T1 s7 treated T2 s8 treated T2 s9 treated T2 s10 untreated T2 s11 untreated T2 s12 untreated T2 Bests, Richard # Session info: > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.9.3.1 gplots_2.11.0 MASS_7.3-26 KernSmooth_2.23-10 [5] caTools_1.14 gdata_2.12.0 gtools_2.7.1 reshape_0.8.4 [9] plyr_1.8 pasilla_0.2.16 DESeq_1.12.0 locfit_1.5-9 [13] DEXSeq_1.6.0 parathyroidSE_0.99.5 DESeq2_1.0.9 RcppArmadillo_0.3.810.2 [17] Rcpp_0.10.3 lattice_0.20-15 Biobase_2.20.0 GenomicRanges_1.12.2 [21] IRanges_1.18.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.3 biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5 [6] colorspace_1.2-2 DBI_0.2-6 dichromat_2.0-0 digest_0.6.3 genefilter_1.42.0 [11] geneplotter_1.38.0 gtable_0.1.2 hwriter_1.3 labeling_0.1 munsell_0.4 [16] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 Rsamtools_1.12.2 [21] RSQLite_0.11.3 scales_0.2.3 splines_3.0.0 statmod_1.4.17 stats4_3.0.0 [26] stringr_0.6.2 survival_2.37-4 tools_3.0.0 XML_3.95-0.2 xtable_1.7-1 [31] zlibbioc_1.6.0 >hi Charles, >>On Tue, Jul 9, 2013 at 3:59 PM, Charles Determan Jr wrote: >>Greetings, >>I have used the DESeq package previously and have been recently using >>DESeq2. I am particularly interested in repeated measures designs and was >>wondering about applications with DESeq2. I have read through the manual >>and tried searching the archives but couldn't find too much direction for >>testing over all timepoints instead of just one at a time (ANOVA- like >>approach). Reading the edgeR manual, it provides an example in section >>3.3.4 that tests whether a treatment has an effect at any time by taking >>multiple coefficients (i.e. lrt <- glmLRT(fit, coef=5:6)). I attempted >>something similar with DESeq2: >>res <- results(dds, name=resultsNames(dds)[5:6] >>but I got the warning message saying only the first element used: >>Warning message:In if (paste0("WaldPvalue_", name) %in% >>names(mcols(object))) { : >>the condition has length > 1 and only the first element will be used >I should clean up the code to provide a warning here, as the results() >function should only accept a character vector of length 1 for the >argument 'name'. >The proper way to test for the significance of multiple coefficients >at once is to use the nbinomLRT() function in DESeq2 and specify a >reduced formula. To test whether the treatment effect at all times is >different than at the baseline time, the reduced formula would remove >the interaction term between treatment and time, so: >design(dds) <- formula(~ time + treatment + treatment:time) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time + treatment)) >res <- results(dds) >If you presume that the treatment effect is the same at all times, you >can test whether the treatment effect is equal to 0 with: ># using the Wald test and coefficient shrinkage >design(dds) <- formula(~ time + treatment) >dds <- DESeq(dds) >res <- results(dds) ># or using the likelihood ratio test as in the previous example >design(dds) <- formula(~ time + treatment) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time)) >res <- results(dds) >The main difference here between the Wald and LRT tests is the >shrinkage of estimated log2 fold changes to 0 using the default >DESeq() function/Wald test. >I will add more examples to the vignette to better explain these cases >of testing multiple coefficients. >Mike
edgeR DESeq DESeq2 edgeR DESeq DESeq2 • 2.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States
Hi Richard, Can you upgrade to the release version of DESeq2 (v1.2)? If this is not possible try the latest version from v1.0. The version you are using is from the first months DESeq2 was released. Mike On Feb 12, 2014 7:27 AM, "Hickman, R.J. (Richard)" <r.j.hickman@uu.nl> wrote: > Dear Mike (and other DESeq2 developers/users), > > I came across this thread regarding the use of the GLM functions in DESeq2 > with respect to time series. However, if I try and perform the analysis you > describe then I get an error when estimating the dispersions— do you know > what is the cause and/or what I could be doing wrong? > > dds <- DESeqDataSetFromMatrix( countData = countMatrix, colData = colData, > design = ~ time + treatment + treatment:time) > dds <- estimateSizeFactors(dds) > dds <- estimateDispersions(dds) > gene-wise dispersion estimates > Error in fitDisp(ySEXP = counts(objectNZ), xSEXP = fit$modelMatrix, > mu_hatSEXP = fit$mu_hat, : > in call to fitDisp, the following arguments contain NA: mu_hatSEXP > > The matrix of counts and the colData are OK, i think.. > > This is the colData df used: > > treatment time > s1 treated T1 > s2 treated T1 > s3 treated T1 > s4 untreated T1 > s5 untreated T1 > s6 untreated T1 > s7 treated T2 > s8 treated T2 > s9 treated T2 > s10 untreated T2 > s11 untreated T2 > s12 untreated T2 > > Bests, > > Richard > > # Session info: > > > sessionInfo() > R version 3.0.0 (2013-04-03) > Platform: x86_64-apple-darwin10.8.0 (64-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] grid parallel stats graphics grDevices utils datasets > methods base > > other attached packages: > [1] ggplot2_0.9.3.1 gplots_2.11.0 MASS_7.3-26 > KernSmooth_2.23-10 > [5] caTools_1.14 gdata_2.12.0 gtools_2.7.1 > reshape_0.8.4 > [9] plyr_1.8 pasilla_0.2.16 DESeq_1.12.0 > locfit_1.5-9 > [13] DEXSeq_1.6.0 parathyroidSE_0.99.5 DESeq2_1.0.9 > RcppArmadillo_0.3.810.2 > [17] Rcpp_0.10.3 lattice_0.20-15 Biobase_2.20.0 > GenomicRanges_1.12.2 > [21] IRanges_1.18.0 BiocGenerics_0.6.0 > > loaded via a namespace (and not attached): > [1] annotate_1.38.0 AnnotationDbi_1.22.3 biomaRt_2.16.0 > Biostrings_2.28.0 bitops_1.0-5 > [6] colorspace_1.2-2 DBI_0.2-6 dichromat_2.0-0 > digest_0.6.3 genefilter_1.42.0 > [11] geneplotter_1.38.0 gtable_0.1.2 hwriter_1.3 > labeling_0.1 munsell_0.4 > [16] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 > reshape2_1.2.2 Rsamtools_1.12.2 > [21] RSQLite_0.11.3 scales_0.2.3 splines_3.0.0 > statmod_1.4.17 stats4_3.0.0 > [26] stringr_0.6.2 survival_2.37-4 tools_3.0.0 > XML_3.95-0.2 xtable_1.7-1 > [31] zlibbioc_1.6.0 > > > >hi Charles, > > >>On Tue, Jul 9, 2013 at 3:59 PM, Charles Determan Jr wrote: > >>Greetings, > > >>I have used the DESeq package previously and have been recently using > >>DESeq2. I am particularly interested in repeated measures designs and was > >>wondering about applications with DESeq2. I have read through the manual > >>and tried searching the archives but couldn't find too much direction for > >>testing over all timepoints instead of just one at a time (ANOVA- like > >>approach). Reading the edgeR manual, it provides an example in section > >>3.3.4 that tests whether a treatment has an effect at any time by taking > >>multiple coefficients (i.e. lrt <- glmLRT(fit, coef=5:6)). I attempted > >>something similar with DESeq2: > > >>res <- results(dds, name=resultsNames(dds)[5:6] > > >>but I got the warning message saying only the first element used: > > >>Warning message:In if (paste0("WaldPvalue_", name) %in% > >>names(mcols(object))) { : > >>the condition has length > 1 and only the first element will be used > > >I should clean up the code to provide a warning here, as the results() > >function should only accept a character vector of length 1 for the > >argument 'name'. > > > >The proper way to test for the significance of multiple coefficients > >at once is to use the nbinomLRT() function in DESeq2 and specify a > >reduced formula. To test whether the treatment effect at all times is > >different than at the baseline time, the reduced formula would remove > >the interaction term between treatment and time, so: > > > >design(dds) <- formula(~ time + treatment + treatment:time) > >dds <- estimateSizeFactors(dds) > >dds <- estimateDispersions(dds) > >dds <- nbinomLRT(dds, reduced = formula(~ time + treatment)) > >res <- results(dds) > > > >If you presume that the treatment effect is the same at all times, you > >can test whether the treatment effect is equal to 0 with: > > > ># using the Wald test and coefficient shrinkage > >design(dds) <- formula(~ time + treatment) > >dds <- DESeq(dds) > >res <- results(dds) > > > ># or using the likelihood ratio test as in the previous example > >design(dds) <- formula(~ time + treatment) > >dds <- estimateSizeFactors(dds) > >dds <- estimateDispersions(dds) > >dds <- nbinomLRT(dds, reduced = formula(~ time)) > >res <- results(dds) > > > >The main difference here between the Wald and LRT tests is the > >shrinkage of estimated log2 fold changes to 0 using the default > >DESeq() function/Wald test. > > > >I will add more examples to the vignette to better explain these cases > >of testing multiple coefficients. > > > >Mike > > > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Mike, That appears to have worked- thanks! Richard ________________________________ From: Michael Love [michaelisaiahlove@gmail.com] Sent: 12 February 2014 14:53 To: Hickman, R.J. (Richard) Cc: bioconductor@r-project.org Subject: Re: [BioC] DESeq2 test over all timepoints? Hi Richard, Can you upgrade to the release version of DESeq2 (v1.2)? If this is not possible try the latest version from v1.0. The version you are using is from the first months DESeq2 was released. Mike On Feb 12, 2014 7:27 AM, "Hickman, R.J. (Richard)" <r.j.hickman@uu.nl<mailto:r.j.hickman@uu.nl>> wrote: Dear Mike (and other DESeq2 developers/users), I came across this thread regarding the use of the GLM functions in DESeq2 with respect to time series. However, if I try and perform the analysis you describe then I get an error when estimating the dispersions� do you know what is the cause and/or what I could be doing wrong? dds <- DESeqDataSetFromMatrix( countData = countMatrix, colData = colData, design = ~ time + treatment + treatment:time) dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) gene-wise dispersion estimates Error in fitDisp(ySEXP = counts(objectNZ), xSEXP = fit$modelMatrix, mu_hatSEXP = fit$mu_hat, : in call to fitDisp, the following arguments contain NA: mu_hatSEXP The matrix of counts and the colData are OK, i think.. This is the colData df used: treatment time s1 treated T1 s2 treated T1 s3 treated T1 s4 untreated T1 s5 untreated T1 s6 untreated T1 s7 treated T2 s8 treated T2 s9 treated T2 s10 untreated T2 s11 untreated T2 s12 untreated T2 Bests, Richard # Session info: > sessionInfo() R version 3.0.0 (2013-04-03) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] grid parallel stats graphics grDevices utils datasets methods base other attached packages: [1] ggplot2_0.9.3.1 gplots_2.11.0 MASS_7.3-26 KernSmooth_2.23-10 [5] caTools_1.14 gdata_2.12.0 gtools_2.7.1 reshape_0.8.4 [9] plyr_1.8 pasilla_0.2.16 DESeq_1.12.0 locfit_1.5-9 [13] DEXSeq_1.6.0 parathyroidSE_0.99.5 DESeq2_1.0.9 RcppArmadillo_0.3.810.2 [17] Rcpp_0.10.3 lattice_0.20-15 Biobase_2.20.0 GenomicRanges_1.12.2 [21] IRanges_1.18.0 BiocGenerics_0.6.0 loaded via a namespace (and not attached): [1] annotate_1.38.0 AnnotationDbi_1.22.3 biomaRt_2.16.0 Biostrings_2.28.0 bitops_1.0-5 [6] colorspace_1.2-2 DBI_0.2-6 dichromat_2.0-0 digest_0.6.3 genefilter_1.42.0 [11] geneplotter_1.38.0 gtable_0.1.2 hwriter_1.3 labeling_0.1 munsell_0.4 [16] proto_0.3-10 RColorBrewer_1.0-5 RCurl_1.95-4.1 reshape2_1.2.2 Rsamtools_1.12.2 [21] RSQLite_0.11.3 scales_0.2.3 splines_3.0.0 statmod_1.4.17 stats4_3.0.0 [26] stringr_0.6.2 survival_2.37-4 tools_3.0.0 XML_3.95-0.2 xtable_1.7-1 [31] zlibbioc_1.6.0 >hi Charles, >>On Tue, Jul 9, 2013 at 3:59 PM, Charles Determan Jr wrote: >>Greetings, >>I have used the DESeq package previously and have been recently using >>DESeq2. I am particularly interested in repeated measures designs and was >>wondering about applications with DESeq2. I have read through the manual >>and tried searching the archives but couldn't find too much direction for >>testing over all timepoints instead of just one at a time (ANOVA- like >>approach). Reading the edgeR manual, it provides an example in section >>3.3.4 that tests whether a treatment has an effect at any time by taking >>multiple coefficients (i.e. lrt <- glmLRT(fit, coef=5:6)). I attempted >>something similar with DESeq2: >>res <- results(dds, name=resultsNames(dds)[5:6] >>but I got the warning message saying only the first element used: >>Warning message:In if (paste0("WaldPvalue_", name) %in% >>names(mcols(object))) { : >>the condition has length > 1 and only the first element will be used >I should clean up the code to provide a warning here, as the results() >function should only accept a character vector of length 1 for the >argument 'name'. >The proper way to test for the significance of multiple coefficients >at once is to use the nbinomLRT() function in DESeq2 and specify a >reduced formula. To test whether the treatment effect at all times is >different than at the baseline time, the reduced formula would remove >the interaction term between treatment and time, so: >design(dds) <- formula(~ time + treatment + treatment:time) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time + treatment)) >res <- results(dds) >If you presume that the treatment effect is the same at all times, you >can test whether the treatment effect is equal to 0 with: ># using the Wald test and coefficient shrinkage >design(dds) <- formula(~ time + treatment) >dds <- DESeq(dds) >res <- results(dds) ># or using the likelihood ratio test as in the previous example >design(dds) <- formula(~ time + treatment) >dds <- estimateSizeFactors(dds) >dds <- estimateDispersions(dds) >dds <- nbinomLRT(dds, reduced = formula(~ time)) >res <- results(dds) >The main difference here between the Wald and LRT tests is the >shrinkage of estimated log2 fold changes to 0 using the default >DESeq() function/Wald test. >I will add more examples to the vignette to better explain these cases >of testing multiple coefficients. >Mike [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 901 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6