Using DESeq2: experimental design and extracting results

1

Entering edit mode

bio2001 ▴ 60

@bio2001-6597

Last seen 7.6 years ago

Hi Michael, I'd like your suggestions on an additional question related to this previous thread on DESeq2. You might be aware that when publishing differential expression results, it is a convention to provide the mean and standard deviation values (after the normalization) for the genes being tested. The DE table output output by DESeq has a column "baseMean". Is this equivalent to the mean expression? Is there a way to get standard deviation? Can the mean and SD be obtained separately for both the treatments being compared? Thanks Sridhar Acharya On 6/11/2014 10:32 AM, Michael Love wrote: > hi Sridhar, > > > > On Wed, Jun 11, 2014 at 9:51 AM, Sridhar A Malkaram > <smalkaram at="" wvstateu.edu=""> wrote: >> Hi Michael, >> >> Thanks for explaining in detail. >> Just for clarification, I think we should add the "~" before the >> "genotype + time", right? > Yes good catch. I should have written "~ genotype + time" > >> Does "~" stand for formula notation in R? > Yes. > >> dds <- DESeq(dds, test="LRT", reduced=genotype + time) >> >> >> Also, when I do, >> >> dds <- nbinomWaldTest(dds, betaPrior=FALSE) >> res<-results(dds, name="time2.gen2") >> head(res) >> >> The first two lines of the output: >> log2 fold change: time2.gen2 >> Wald test p-value: time2.gen2 >> ..... >> >> What two components is the log2 fold change between? I mean, how do I read time2.gen2 ? >> How can I get the two expression values that are being compared. The older version of DESeq used to give these values in the results. >> > The time series model with interactions between time and a condition > variable is a bit more complicated than the simple group comparisons. > There are not simply two levels being compared. The way that > interaction effects are encoded in model matrices in R is to write the > two terms of which the interaction is composed next to each other. We > use this convention as well. I'll try to explain the meaning of the > model you have specified. The model fits a number of main effects: > > Firstly, we have the change in expression levels over time for the > base level of the condition variable (so gen1). These effects are the > fold change (or addition in the log2 space) of the time 2 over time1 > (the intercept, or base level), time 3 over time1, time 4 over time 1. > If you specify results(dds, name="time_2_vs_1") you would get a > results table with a simple description of the comparison. But this is > probably not of interest for you for testing, as it doesn't tell you > anything about the effect of the experimental condition of interest, > e.g. the genotype. > > Secondly, we have the fold change of genotype 2 over 1 for the first > time period. This is also obtained with results(dds, > name="gen_2_vs_1") > > Finally are the interaction terms. For example, "gen2.time2" is the > fold change for genotype 2 at time 2 which is not explained by the > genotype 2 main fold change and the time 2 main fold change multiplied > together (or added in the log2 scale). If the log fold change of this > intercept term equals 0, then we say there is no interaction. In other > words, the effect of genotype 2 is the same at time 2 as it was at > time 1, after accounting for the general effect of time on expression. > > Mike > >> -- With Kind Regards, >> Sridhar Acharya >> >> >> >> >> >> >> >> On 6/10/2014 8:32 PM, Michael Love wrote: >>> I left out one detail, below: >>> >>> On Tue, Jun 10, 2014 at 8:20 PM, Michael Love >>> <michaelisaiahlove at="" gmail.com=""> wrote: >>>> hi Sridhar, >>>> >>>> Let's keep the discussion on the mailing list in case the question is >>>> relevant to others. >>>> >>>> After you have run: >>>> >>>> design(dds) <- ~ genotype + time + genotype:time >>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time) >>>> res <- results(dds) >>>> >>>> The res object will contain the likelihood ratio test results, with >>>> small p-values for genes which have a genotype effect which is >>>> different than in the first time period. This tests all time periods >>>> after the first time period. >>>> >>>> You also see: >>>> >>>> resultsNames(ddsLRT) >>>> Intercept time2_vs_time1 time3_vs_time1 time4_vs_time1 >>>> gen2_vs_gen1 time2.gen2 time3.gen2 time4.gen2 >>>> >>>> The three of these which might be interesting for your experiment are: >>>> >>> Before the results line below you should call: >>> >>> dds <- nbinomWaldTest(dds, betaPrior=FALSE) >>> >>>> results(dds, name="time2.gen2") >>>> ...and same for time3.gen2, time4.gen2 >>>> >>>> which will return a results table with Wald tests of the additional >>>> genotype effect in time 2 (additional beyond the genotype effect in >>>> the first time period). This is similar to the first LRT results >>>> above, except now we are asking for a different effect of genotype in >>>> a specific time period, not in all time periods. >>>> >>>> The other coefficients are the main effect terms. Results tables for >>>> these can also be built by using the 'name' argument to results(). >>>> They are the intercept term, the effects of the different times over >>>> the initial time, and the effect for genotype 2 over 1 in the first >>>> time period. You don't want to use the contrast argument, which is for >>>> other kinds of models. >>>> >>>> Mike >>>> >>>> On Tue, Jun 10, 2014 at 3:38 PM, Michael Love >>>> <michaelisaiahlove at="" gmail.com=""> wrote: >>>>> hi Sridhar, >>>>> >>>>> On Tue, Jun 10, 2014 at 3:05 PM, Sridhar A Malkaram >>>>> <smalkaram at="" wvstateu.edu=""> wrote: >>>>>> Hi, >>>>>> >>>>>> >>>>>> I have been a user of DESeq and recently DESeq2 for my research work. >>>>>> The latest DESeq2 seem to offer extensive differential testing options >>>>>> suitable for various experimental designs. >>>>>> >>>>>> Recently I wanted to use DESeq for a differential gene expression >>>>>> analysis between two plant genotypes across 4 different time points. >>>>>> >>>>>> I am basically a biologist and am finding hard to grasp the concepts of >>>>>> testing results. I'd be very grateful if you could help me understand >>>>>> some concepts (especially resultsNames) related to the DESeq2 package. >>>>>> >>>>>> >>>>>> My experimental design is as below >>>>>> >>>>>> design<- ~ genotype + time + genotype:time >>>>>> >>>>>> There are two levels in genotype and 4 levels in time. >>>>>> Basically I'd like to use binomLRT test to check if there is any >>>>>> difference in gene expression between the genotypes across the time points. >>>>>> >>>>>> dds<-DESeq(dds) (dds is DESeq2 object obtained from, >>>>>> dds<-DESeqDataSetFromMatrix(countData=counts, colData=coldata, >>>>>> design=design) >>>>>> >>>>>> and I am using the reduced model for the liklihood test >>>>>> >>>>> Here is where things are getting confused. You have already run >>>>> DESeq() using test="Wald". So it doesn't make sense at this point to >>>>> instead perform a likelihood ratio test. In our vignette we explain >>>>> this in the section on the LRT: "The likelihood ratio test can also be >>>>> speci?ed using the test argument to DESeq, which substitutes >>>>> nbinomWaldTest with nbinomLRT." >>>>> >>>>>> Is the model correct per my research question (is there a (time >>>>>> influenced) difference between genotypes)? >>>>>> >>>>> Yes. If you want to find those genes which show a time influenced >>>>> difference between genotypes, this is simply: >>>>> >>>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time) >>>>> res <- results(dds) >>>>> >>>>> You can then use heatmaps to inspect the patterns of gene expression >>>>> for the differentially expressed genes. Visualization with heatmaps >>>>> are covered in the vignette. >>>>> >>>>> If you have other more specific questions about how to generate >>>>> results tables, I can answer them. With time series experiments, there >>>>> are many possible combinations to test, but rather than going through >>>>> all combinations, we recommend that users explore the results with >>>>> heatmaps. >>>>> >>>>> Mike >>>>> >>>>>> Thanks, >>>>>> Sridhar Acharya >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Visualization DESeq DESeq2 Visualization DESeq DESeq2 • 2.4k views

ADD COMMENT • link updated 10.5 years ago by Michael Love 43k • written 10.5 years ago by bio2001 ▴ 60

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 38 minutes ago

United States

hi Sridhar, The base mean is the mean of normalized counts for all samples. A few weeks ago I wrote a reply showing how to calculate the mean for each group, and you can easily modify this code to get the standard deviation or any summary. https://stat.ethz.ch/pipermail/bioconductor/2014-July/060631.html Mike On Wed, Jul 30, 2014 at 1:13 PM, Sridhar A Malkaram <smalkaram at="" wvstateu.edu=""> wrote: > Hi Michael, > > I'd like your suggestions on an additional question related to this > previous thread on DESeq2. > > You might be aware that when publishing differential expression results, > it is a convention to provide the mean and standard deviation values > (after the normalization) for the genes being tested. The DE table > output output by DESeq has a column "baseMean". Is this equivalent to > the mean expression? Is there a way to get standard deviation? Can the > mean and SD be obtained separately for both the treatments being compared? > > Thanks > > Sridhar Acharya > > > > On 6/11/2014 10:32 AM, Michael Love wrote: >> hi Sridhar, >> >> >> >> On Wed, Jun 11, 2014 at 9:51 AM, Sridhar A Malkaram >> <smalkaram at="" wvstateu.edu=""> wrote: >>> Hi Michael, >>> >>> Thanks for explaining in detail. >>> Just for clarification, I think we should add the "~" before the >>> "genotype + time", right? >> Yes good catch. I should have written "~ genotype + time" >> >>> Does "~" stand for formula notation in R? >> Yes. >> >>> dds <- DESeq(dds, test="LRT", reduced=genotype + time) >>> >>> >>> Also, when I do, >>> >>> dds <- nbinomWaldTest(dds, betaPrior=FALSE) >>> res<-results(dds, name="time2.gen2") >>> head(res) >>> >>> The first two lines of the output: >>> log2 fold change: time2.gen2 >>> Wald test p-value: time2.gen2 >>> ..... >>> >>> What two components is the log2 fold change between? I mean, how do I read time2.gen2 ? >>> How can I get the two expression values that are being compared. The older version of DESeq used to give these values in the results. >>> >> The time series model with interactions between time and a condition >> variable is a bit more complicated than the simple group comparisons. >> There are not simply two levels being compared. The way that >> interaction effects are encoded in model matrices in R is to write the >> two terms of which the interaction is composed next to each other. We >> use this convention as well. I'll try to explain the meaning of the >> model you have specified. The model fits a number of main effects: >> >> Firstly, we have the change in expression levels over time for the >> base level of the condition variable (so gen1). These effects are the >> fold change (or addition in the log2 space) of the time 2 over time1 >> (the intercept, or base level), time 3 over time1, time 4 over time 1. >> If you specify results(dds, name="time_2_vs_1") you would get a >> results table with a simple description of the comparison. But this is >> probably not of interest for you for testing, as it doesn't tell you >> anything about the effect of the experimental condition of interest, >> e.g. the genotype. >> >> Secondly, we have the fold change of genotype 2 over 1 for the first >> time period. This is also obtained with results(dds, >> name="gen_2_vs_1") >> >> Finally are the interaction terms. For example, "gen2.time2" is the >> fold change for genotype 2 at time 2 which is not explained by the >> genotype 2 main fold change and the time 2 main fold change multiplied >> together (or added in the log2 scale). If the log fold change of this >> intercept term equals 0, then we say there is no interaction. In other >> words, the effect of genotype 2 is the same at time 2 as it was at >> time 1, after accounting for the general effect of time on expression. >> >> Mike >> >>> -- With Kind Regards, >>> Sridhar Acharya >>> >>> >>> >>> >>> >>> >>> >>> On 6/10/2014 8:32 PM, Michael Love wrote: >>>> I left out one detail, below: >>>> >>>> On Tue, Jun 10, 2014 at 8:20 PM, Michael Love >>>> <michaelisaiahlove at="" gmail.com=""> wrote: >>>>> hi Sridhar, >>>>> >>>>> Let's keep the discussion on the mailing list in case the question is >>>>> relevant to others. >>>>> >>>>> After you have run: >>>>> >>>>> design(dds) <- ~ genotype + time + genotype:time >>>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time) >>>>> res <- results(dds) >>>>> >>>>> The res object will contain the likelihood ratio test results, with >>>>> small p-values for genes which have a genotype effect which is >>>>> different than in the first time period. This tests all time periods >>>>> after the first time period. >>>>> >>>>> You also see: >>>>> >>>>> resultsNames(ddsLRT) >>>>> Intercept time2_vs_time1 time3_vs_time1 time4_vs_time1 >>>>> gen2_vs_gen1 time2.gen2 time3.gen2 time4.gen2 >>>>> >>>>> The three of these which might be interesting for your experiment are: >>>>> >>>> Before the results line below you should call: >>>> >>>> dds <- nbinomWaldTest(dds, betaPrior=FALSE) >>>> >>>>> results(dds, name="time2.gen2") >>>>> ...and same for time3.gen2, time4.gen2 >>>>> >>>>> which will return a results table with Wald tests of the additional >>>>> genotype effect in time 2 (additional beyond the genotype effect in >>>>> the first time period). This is similar to the first LRT results >>>>> above, except now we are asking for a different effect of genotype in >>>>> a specific time period, not in all time periods. >>>>> >>>>> The other coefficients are the main effect terms. Results tables for >>>>> these can also be built by using the 'name' argument to results(). >>>>> They are the intercept term, the effects of the different times over >>>>> the initial time, and the effect for genotype 2 over 1 in the first >>>>> time period. You don't want to use the contrast argument, which is for >>>>> other kinds of models. >>>>> >>>>> Mike >>>>> >>>>> On Tue, Jun 10, 2014 at 3:38 PM, Michael Love >>>>> <michaelisaiahlove at="" gmail.com=""> wrote: >>>>>> hi Sridhar, >>>>>> >>>>>> On Tue, Jun 10, 2014 at 3:05 PM, Sridhar A Malkaram >>>>>> <smalkaram at="" wvstateu.edu=""> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> I have been a user of DESeq and recently DESeq2 for my research work. >>>>>>> The latest DESeq2 seem to offer extensive differential testing options >>>>>>> suitable for various experimental designs. >>>>>>> >>>>>>> Recently I wanted to use DESeq for a differential gene expression >>>>>>> analysis between two plant genotypes across 4 different time points. >>>>>>> >>>>>>> I am basically a biologist and am finding hard to grasp the concepts of >>>>>>> testing results. I'd be very grateful if you could help me understand >>>>>>> some concepts (especially resultsNames) related to the DESeq2 package. >>>>>>> >>>>>>> >>>>>>> My experimental design is as below >>>>>>> >>>>>>> design<- ~ genotype + time + genotype:time >>>>>>> >>>>>>> There are two levels in genotype and 4 levels in time. >>>>>>> Basically I'd like to use binomLRT test to check if there is any >>>>>>> difference in gene expression between the genotypes across the time points. >>>>>>> >>>>>>> dds<-DESeq(dds) (dds is DESeq2 object obtained from, >>>>>>> dds<-DESeqDataSetFromMatrix(countData=counts, colData=coldata, >>>>>>> design=design) >>>>>>> >>>>>>> and I am using the reduced model for the liklihood test >>>>>>> >>>>>> Here is where things are getting confused. You have already run >>>>>> DESeq() using test="Wald". So it doesn't make sense at this point to >>>>>> instead perform a likelihood ratio test. In our vignette we explain >>>>>> this in the section on the LRT: "The likelihood ratio test can also be >>>>>> speci?ed using the test argument to DESeq, which substitutes >>>>>> nbinomWaldTest with nbinomLRT." >>>>>> >>>>>>> Is the model correct per my research question (is there a (time >>>>>>> influenced) difference between genotypes)? >>>>>>> >>>>>> Yes. If you want to find those genes which show a time influenced >>>>>> difference between genotypes, this is simply: >>>>>> >>>>>> dds <- DESeq(dds, test="LRT", reduced=genotype + time) >>>>>> res <- results(dds) >>>>>> >>>>>> You can then use heatmaps to inspect the patterns of gene expression >>>>>> for the differentially expressed genes. Visualization with heatmaps >>>>>> are covered in the vignette. >>>>>> >>>>>> If you have other more specific questions about how to generate >>>>>> results tables, I can answer them. With time series experiments, there >>>>>> are many possible combinations to test, but rather than going through >>>>>> all combinations, we recommend that users explore the results with >>>>>> heatmaps. >>>>>> >>>>>> Mike >>>>>> >>>>>>> Thanks, >>>>>>> Sridhar Acharya >>>>>>> >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 10.5 years ago Michael Love 43k

Login before adding your answer.