I am new to the Bioinformatics field and am trying to gain as much knowledge as possible in short amounts of time so I can tackle the analysis part of data in our labs.
I started to read at first into RNAseq, since we have bulk RNAseq data of sperms to analyse between three different conditions. wildtype (WT), heterozygous (HZ) and mutant (MT) - each having three biological replicates. For two conditions, such as WT vs MT, the analysis is straightforward employing DESeq2. It gets puzzling when using three conditions, since I don't know if in the background the Wald Test is considering all three conditions in the designated column of the design file. When I look at the levels, I see that three levels are found (WT, HZ and MT) and that WT is on the first position, because I refactored it that way. Nevertheless, I am not sure what exactly is happening under the hood. When I look at the resultsNames() I get ""Intercept" "condition_hz_vs_wt" "condition_rd_vs_wt"
- So, what has happened when calling DESeq()? Did the Wald Test include all nine samples for its calculations? Did Wald test only calculate HZ against WT and MT against WT comparisons seperately?
- In theory, if I separate samples input counts in WT-MT and WT-HZ pairs and do the whole procedure in separate files and sessions (two different design files), I should get the same results as with the first approach described above, with all the samples/conditions combined (L2FC, p-value, adj-p-value, etc), am i right?
- if the prior point is true, I would use three different parallel approaches to compare WT-HZ, WT-MT and HZ-MT. Since an ANOVA-like approach seems strange here and I don't understand how three conditions could be used in such an approach.
Thanks in advance. Trying to get a good grasp on these things, however, it seems that sometimes the descriptions online confuse me even more.
Cheers
Yes, all samples are considered. You can use contrasts to get the pairwise results you want. Calling two groups separately will likely give similar but not identical results as normalization and estimation of model parameters will be slightly different, the vignette has a section about that. You can compare all three levels to each other in a single analysis, see vignette on contrasts and on the question when or when not to split groups.