Entering edit mode
Why is the dispersion high for data which yields the following PCA?
As far as I have understood DESeq2 vignette If I have multiple groups, should I run all together or split into pairs of groups? the main thing which influences the dispersion is within-group variation.
However, for this PCA the dispersion is but it is only the between-group variation that is (very) high, not within group variation.
Is my understanding correct that dispersion should mainly be influenced by within-group-variation (as in the example in the vignette) and not by between-group-variation? This is what the vignette (and the answer above) seems to say. However, when I look at some gene values & dispersion in this example, I get very high levels of dispersion, and they seem to be caused by between-group-variation.
Dispersion depends on the design, can you show
design(dds)
andcolData(dds)
?You are right, I have chosen design=~1. This explains the dispersion plot.
I understand this to be a mistake. Now it seems to me that even for QC figures, one should use the design one intends to use for DE. One should simply set blind=TRUE as an rlog parameter (for the sake of QC).