Dear Michael, when it comes to analyse different comparisons in a complex experimental design, generally speaking, there are two apparent solutions:
- first one is to consider the whole experiment (i.e. using just one Sample table containing all the samples) and write a string of code to select desired comparisons or,
- secondly, select desired group and run the analysis, separately (i.e. duplicating the folder containing just the desired list of samples).
Let's do an example: let's say we have three conditions (wt, ko, wt-treated); then, I want to know the differentially expressed genes coming from wt-treated vs wt condition and ko vs wt condition. If I am following point 1 above, then, I can do the following contrast and get results:
dds$condition <- relevel(dds$condition, ref = "wt")
If I am using the same string of code but, I merely duplicate my folder with aligned files (e.g. just wt and wt-treated) together with another sample table with just the selected comparisons, I may ask myself if I could get differences or not. In that case, I may wondering what's the rationale of choosing one method vs another (regardless the fact the first one is more convenient).
Similar story when we have to calculate PCA plot. If I am subsetting from the whole experiment or if I am picking up just desired samples there are tiny differences in term of maths 'only', I guess. The PCA plots, in other words, look very similar. So, again, given the fact in a certain plot I just want to know the difference between two groups probably method in point number 2 may be better, this time?
thanks in advance, hope I made myself clear.