Question

DESeq2 nbinomLRT results interpretation

0

Entering edit mode

Fix Ace ▴ 100

@fix-ace-5689

Last seen 9.8 years ago

United States

I have RNAseq data from 5 different developmental stages and try to use DESeq2 to find the differentially expressed genelist along any stages

I used the following code following the manual

ddsLRT=nbinomLRT(OD_dds, reduced=~ 1)

I am not a stat person and so my question is about the results:

since the manual said the LRT test here is the test on the difference in deviance between a full and reduced model, do those p values indicating the significance of the differentially expressed genes?

Is it a way to output the fitted estimate of the gene expression level?

Thanks a lot for any input!!

Thanks!

deseq2 • 3.5k views

ADD COMMENT • link updated 10.2 years ago by Michael Love 43k • written 10.2 years ago by Fix Ace ▴ 100

score 0 · Answer 1 · 2014-10-28

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 10 hours ago

United States

These p-values are the probability of seeing such differences in counts under the null hypothesis that the developmental stage variable (all 5 stages) has no effect on the counts. So a gene which shows a difference at any of the stages would get a small p-value if the experiment has enough statistical power.

In the vignette section "3.10 Access to all calculated values", we show how to obtain the fitted means mu_ij.

For an explanation of the term, see the vignette section "4.1 The DESeq2 model" or the formulas in ?DESeq or the manuscript linked on the first page of the vignette.

ADD COMMENT • link 10.2 years ago Michael Love 43k

0

Entering edit mode

My understanding based on the manual is that the differentially expressed gene list obtained is still pair wise comparison.

That is, if I have A, B, C, D, E samples, DESeq2 can only give me something like, B, C, D, E vs A, instead of differences between any two samples...

Is it correct?

Thanks a lot!!

ADD REPLY • link 10.2 years ago Fix Ace ▴ 100

0

Entering edit mode

The log2 fold change column is a comparison of two groups, but this column is separate from the p-value which is calculated as I said above. This is indicated in the print out above the results table and if you look at mcols(res).

ADD REPLY • link 10.2 years ago Michael Love 43k

0

Entering edit mode

May I resurrect this thread if possible, as the LRT p.val is something I have been confused about.

"So a gene which shows a difference at any of the stages would get a small p-value if the experiment has enough statistical power"

Does this mean if doing a time course (or dev stage as example above) that the LRT would call significance on a gene if differences in counts are seen on any of the time points(or dev stages). So LRT would call significance if gene x changes from time point dev stage 1-2. But that the change from 2-3 or 1-3 is not significant? I hope this question makes sense! And would be happy to start a new thread if that would be better.

I ask this as I am doing a time course study and want to look for changes in gene expression over time. So I would have though that the LRT would call genes significant.. if every change was significant... i.e, if change in gene expression from time 1-2, and then 2-3 and 3-4 were significant. Have I got this assumption wrong?

ADD REPLY • link 6.9 years ago A ▴ 40

0

Entering edit mode

The LRT would call a gene significant if there are any differences at any time point (assuming it has sufficient power). This may help: the only gene which would not be called significant is if the expression is flat across time.

ADD REPLY • link 6.9 years ago Michael Love 43k

0

Entering edit mode

This is my concern though. If you have 4 time points. And gene x changes (increases for example) from time point 1-2 signifcantly...but then decreases in a statistically insignificant manner, then how does one draw conclusions. If the P.val in results is a result of the LRT itself and not the logfold change for the the current pairwise comparison that page represents, then is there an output that shows p values after an LRT for true difference actually between the different time points?

Maybe I am fundamentally misunderstanding something... But, I am looking for specific patterns of change in time. So again, with the example above one significant change from point 1-2, but then not from 2-3 is would be concerning. Is there a way to exclude these results?

ADD REPLY • link 6.9 years ago A ▴ 40

0

Entering edit mode

I'd suggest partnering with a statistical collaborator. I try to use the forum to explain how to use the software, but you really need to know what statistical tests you are interested in. We have a lot of documentation about the different tests, but if this isn't helping, you should find someone locally to help guide the analysis.

ADD REPLY • link 6.9 years ago Michael Love 43k

0

Entering edit mode

Many thanks! Yes I will be partnering with someone to go through this properly. In the meantime, may I then ask an LRT related question (about the design specifically). I have seen other threads and just one to confirm I understand this.

I am looking for changes in time. I have 4 genotypes. Principally , I would like to see the genes that change with time across any of the samples before looking for patterns in gene expression. My design is as follows:

dds<-DESeqDataSetFromMatrix(countData = countdata, colData = coldatanormal, design = ~Time+Genotype )

dds<- DESeq(dds, reduced = ~Genotype , test = "LRT")

Is this the correct design to look for changes across time in general if the reduced model is ~genotype. In other words, will this tell me any gene that changes from time 0 onwards in any of the genotypes and that time is the variable by which genes change? Conversely, if the reduced is ~time, does this give me genes that change ONLY as a result of genotype differences?

If I then make the full model ~Time+Genotype+Time:Genotype, and the reduced is time + genotype, are we looking for changes here that are both DE in time AND because of genotype too?

If the reduced model then becomes time only with a full design of ~Time+Genotype+Time:Genotype, what does this mean? and why would one would one use a reduced model of time or genotype with this full design as oppose to using it with a full design of ~Time and Genotype.

Thank you :) A

ADD REPLY • link 6.9 years ago A ▴ 40

0

Entering edit mode

"Is this the correct design to look for changes across time in general if the reduced model is ~genotype. In other words, will this tell me any gene that changes from time 0 onwards in any of the genotypes and that time is the variable by which genes change? Conversely, if the reduced is ~time, does this give me genes that change ONLY as a result of genotype differences?"

Yes. Except that this model is fitting a single time profile for all genotypes. So there is no such thing here as the changes for a specific genotype.

"If I then make the full model ~Time+Genotype+Time:Genotype, and the reduced is time + genotype, are we looking for changes here that are both DE in time AND because of genotype too?"

No. This would be to test if any of the changes over time are different across genotype (you can rephrase this in the other order, it's an equivalent statement)

"If the reduced model then becomes time only with a full design of ~Time+Genotype+Time:Genotype, what does this mean?"

In general, you can interpret a LRT like this: you are testing whether the coefficients that you've removed in the reduced model can explain differences in the counts.

ADD REPLY • link 6.9 years ago Michael Love 43k

0

Entering edit mode

Thank you so so much! That explains it perfectly! And thank you for being patient with me and all my questions!

As a quick follow up to your last point:with a full model of time+Genotype+Time:Genotype, is the reduced model of ~Time alone, or ~Genotype alone (Not reduced of Time+ and Genotype together) redundant?

In other words, would Full= ~Time+Genotype+Time:Genotype and reduced = ~Time only OR ~Genotype only be the equivalent of running these reduced models under a full model of Full=~Time + Genotype WITHOUT the addition of Time:Genotype

So would the addition of Time:Genotype as you mention above only be useful for linking differences in gene changes across genotype at given times

Thanks again!

ADD REPLY • link 6.9 years ago A ▴ 40

0

Entering edit mode

Sorry I’m not able to interpret these questions. I think you should pose these to collaborator.

ADD REPLY • link 6.9 years ago Michael Love 43k

0

Entering edit mode

Sorry it was confusing...The question was basically, is Full= ~Time+Genotype+Time:Genotype and reduced ~Genotype the same as:

Full= ~Time+Genotype and reduced ~Genotype.

ADD REPLY • link 6.9 years ago A ▴ 40