EDIT:
I think this post might answer my question?
https://www.biostars.org/p/115685/
Hi there,
I was hoping for some clarification regarding the 'results' function of DESEQ2. I see from the manual it says that if 'contrast' is not specified, it will return 'results' from the last comparison. Is it then fair to say if your dataset is Control, ConditionA, ConditionB and you do not specify the contrast the results will be ConditionB vs Control? Therefore when you filter 'results' based on padj value < 0.05 you will only have the genes with significant padj values from that last comparison (i.e. not the whole dataset)? (Warning, I am really bad at R)
With an example:
I have the following dataset: two treatment conditions ('1h oxygen' and 'NAO') and one control ('Control') each with 4 bioreps.
dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design=~treatment)
#relevel against control
dds$treatment <- relevel(dds$treatment, ref="Control")
dds <- DESeq(dds)
# Make three results, one of each comparison + default 'uncontrast')
res.1hvsC <- results(dds,contrast=c("treatment", "1h_Oxygen", "Control"))
res.naovsC <- results(dds,contrast=c("treatment","NAO", "Control"))
res.uncontrast <- results(dds)
# Grab the significant genes
res.1hvC.Sig <- res.1hvsC[ which(res.1hvsC$padj < 0.05), ]
res.naovC.Sig <- res.naovsC[ which(res.naovsC$padj < 0.05), ]
res.uncontrast.sig <- res.uncontrast[ which(res.uncontrast$padj < 0.05), ]
Then I examine these dataframes and find:
> res.1hvC.Sig
log2 fold change (MAP): treatment 1h_Oxygen vs Control
Wald test p-value: treatment 1h_Oxygen vs Control
DataFrame with 3838 rows and 6 columns
> res.naovC.Sig
log2 fold change (MAP): treatment NAO vs Control
Wald test p-value: treatment NAO vs Control
DataFrame with 3017 rows and 6 columns
> res.uncontrast.Sig
log2 fold change (MAP): treatment NAO vs Control
Wald test p-value: treatment NAO vs Control
DataFrame with 3017 rows and 6 columns
As you can see, the DataFrame for the last (default contrast) and specified contrast (NAO v Control) have the same size (and values.. not shown). From this toy example, it makes me think that if I want to look at all genes across both samples that have a significant padj I need to either do the intersection of these two gene lists (res.1hvC.Sig and res.nao.Sig) to have all genes that have padj < 0.05 in each condition or a combined list (with only unique identifiers) for each.
Do you have any recommendations or resource suggestions on how to proceed with this conundrum? Specifically with multiple conditions?
Thank you very much for your time (+ Deseq2!!),
/Courtney
To extend on this observation, I redid the analysis with the LRT and found that now each comparison has the same number of values where each gene has the same padj but different fold-change (depending on the condition). So would you recommend when doing a 'two (or more) treatments vs control' experiment that we should use the LRT?
Again, apologies for my ignorance in stats (a crutch wielded shamefully by many biologists) could you clarify some wording for me from the vignette - by 'testing 3 or more levels' does this mean in essence, 3 or more conditions (e.g., control, conditionA, conditionB)?
"The LRT is therefore useful for testing multiple terms at once, for example testing 3 or more levels of a factor at once, or all interactions between two variables. The LRT for count data is conceptually similar to an analysis of variance (ANOVA) calculation in linear regression, except that in the case of the Negative Binomial GLM, we use an analysis of deviance (ANODEV), where the deviance captures the difference in likelihood between a full and a reduced model."
If so, then I think LRT is the way to go for such experimental designs?
Thanks very much for helping me with this!
Update on code:
dds <- DESeqDataSetFromMatrix(countData = countData,colData = colData, design=~treatment)
dds$treatment <- relevel(dds$treatment, ref="Control")
dds_lrt <- DESeq(dds, test="LRT", reduced=~1)
# get results for each contrast
res.1hvsC<-results(dds_lrt,contrast=c("treatment", "1h_Oxygen", "Control"))
res.naovsC<-results(dds_lrt,contrast=c("treatment","NAO", "Control"))
res.uncontrast <- results(dds_lrt)
# get significant results for each contrast
res.1hvC.Sig <- res.1hvsC[ which(res.1hvsC$padj < 0.05), ]
res.naovC.Sig <- res.naovsC[ which(res.naovsC$padj < 0.05), ]
res.uncontrast.Sig <- res.uncontrast[ which(res.uncontrast$padj < 0.05), ]
> res.1hvC.Sig
log2 fold change (MLE): treatment 1h_Oxygen vs Control
LRT p-value: '~ treatment' vs '~ 1'
DataFrame with 4872 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
SS50377_r0010 89915.71117 -1.361059 0.4515020 8.875474 0.0118226633 0.0215168112
> res.naovC.Sig
log2 fold change (MLE): treatment NAO vs Control
LRT p-value: '~ treatment' vs '~ 1'
DataFrame with 4872 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
SS50377_r0010 89915.71117 -0.6538221 0.4515003 8.875474 0.0118226633 0.0215168112
> res.uncontrast.Sig
log2 fold change (MLE): treatment NAO vs Control
LRT p-value: '~ treatment' vs '~ 1'
DataFrame with 4872 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
SS50377_r0010 89915.71117 -0.6538221 0.4515003 8.875474 0.0118226633 0.0215168112