I have RNAseq data from 5 different developmental stages and try to use DESeq2 to find the differentially expressed genelist along any stages
I used the following code following the manual
ddsLRT=nbinomLRT(OD_dds, reduced=~ 1)
I am not a stat person and so my question is about the results:
since the manual said the LRT test here is the test on the difference in deviance between a full and reduced model, do those p values indicating the significance of the differentially expressed genes?
Is it a way to output the fitted estimate of the gene expression level?
Thanks a lot for any input!!
Thanks!
My understanding based on the manual is that the differentially expressed gene list obtained is still pair wise comparison.
That is, if I have A, B, C, D, E samples, DESeq2 can only give me something like, B, C, D, E vs A, instead of differences between any two samples...
Is it correct?
Thanks a lot!!
The log2 fold change column is a comparison of two groups, but this column is separate from the p-value which is calculated as I said above. This is indicated in the print out above the results table and if you look at mcols(res).
May I resurrect this thread if possible, as the LRT p.val is something I have been confused about.
"So a gene which shows a difference at any of the stages would get a small p-value if the experiment has enough statistical power"
Does this mean if doing a time course (or dev stage as example above) that the LRT would call significance on a gene if differences in counts are seen on any of the time points(or dev stages). So LRT would call significance if gene x changes from time point dev stage 1-2. But that the change from 2-3 or 1-3 is not significant? I hope this question makes sense! And would be happy to start a new thread if that would be better.
I ask this as I am doing a time course study and want to look for changes in gene expression over time. So I would have though that the LRT would call genes significant.. if every change was significant... i.e, if change in gene expression from time 1-2, and then 2-3 and 3-4 were significant. Have I got this assumption wrong?
The LRT would call a gene significant if there are any differences at any time point (assuming it has sufficient power). This may help: the only gene which would not be called significant is if the expression is flat across time.
This is my concern though. If you have 4 time points. And gene x changes (increases for example) from time point 1-2 signifcantly...but then decreases in a statistically insignificant manner, then how does one draw conclusions. If the P.val in results is a result of the LRT itself and not the logfold change for the the current pairwise comparison that page represents, then is there an output that shows p values after an LRT for true difference actually between the different time points?
Maybe I am fundamentally misunderstanding something... But, I am looking for specific patterns of change in time. So again, with the example above one significant change from point 1-2, but then not from 2-3 is would be concerning. Is there a way to exclude these results?
I'd suggest partnering with a statistical collaborator. I try to use the forum to explain how to use the software, but you really need to know what statistical tests you are interested in. We have a lot of documentation about the different tests, but if this isn't helping, you should find someone locally to help guide the analysis.
Many thanks! Yes I will be partnering with someone to go through this properly. In the meantime, may I then ask an LRT related question (about the design specifically). I have seen other threads and just one to confirm I understand this.
I am looking for changes in time. I have 4 genotypes. Principally , I would like to see the genes that change with time across any of the samples before looking for patterns in gene expression. My design is as follows:
dds<-DESeqDataSetFromMatrix(countData = countdata, colData = coldatanormal, design = ~Time+Genotype )
dds<- DESeq(dds, reduced = ~Genotype , test = "LRT")
Is this the correct design to look for changes across time in general if the reduced model is ~genotype. In other words, will this tell me any gene that changes from time 0 onwards in any of the genotypes and that time is the variable by which genes change? Conversely, if the reduced is ~time, does this give me genes that change ONLY as a result of genotype differences?
If I then make the full model ~Time+Genotype+Time:Genotype, and the reduced is time + genotype, are we looking for changes here that are both DE in time AND because of genotype too?
If the reduced model then becomes time only with a full design of ~Time+Genotype+Time:Genotype, what does this mean? and why would one would one use a reduced model of time or genotype with this full design as oppose to using it with a full design of ~Time and Genotype.
Thank you :) A
"Is this the correct design to look for changes across time in general if the reduced model is ~genotype. In other words, will this tell me any gene that changes from time 0 onwards in any of the genotypes and that time is the variable by which genes change? Conversely, if the reduced is ~time, does this give me genes that change ONLY as a result of genotype differences?"
Yes. Except that this model is fitting a single time profile for all genotypes. So there is no such thing here as the changes for a specific genotype.
"If I then make the full model ~Time+Genotype+Time:Genotype, and the reduced is time + genotype, are we looking for changes here that are both DE in time AND because of genotype too?"
No. This would be to test if any of the changes over time are different across genotype (you can rephrase this in the other order, it's an equivalent statement)
"If the reduced model then becomes time only with a full design of ~Time+Genotype+Time:Genotype, what does this mean?"
In general, you can interpret a LRT like this: you are testing whether the coefficients that you've removed in the reduced model can explain differences in the counts.
Thank you so so much! That explains it perfectly! And thank you for being patient with me and all my questions!
As a quick follow up to your last point:with a full model of time+Genotype+Time:Genotype, is the reduced model of ~Time alone, or ~Genotype alone (Not reduced of Time+ and Genotype together) redundant?
In other words, would Full= ~Time+Genotype+Time:Genotype and reduced = ~Time only OR ~Genotype only be the equivalent of running these reduced models under a full model of Full=~Time + Genotype WITHOUT the addition of Time:Genotype
So would the addition of Time:Genotype as you mention above only be useful for linking differences in gene changes across genotype at given times
Thanks again!
Sorry I’m not able to interpret these questions. I think you should pose these to collaborator.
Sorry it was confusing...The question was basically, is Full= ~Time+Genotype+Time:Genotype and reduced ~Genotype the same as:
Full= ~Time+Genotype and reduced ~Genotype.