Hi,
questions from a non-statistical expert...
The PCA function within DESeq2 selects ntop genes before calculating PCA. This appears to make differences across samples in the 2D plot clearer. What is the statistical reason for doing this pre-selection? And for using absolute variance versus, for example, coefficient of variation?
And if I want to do a heatmap of the "top variable" genes is using absolute variance or CV only a matter of what genes we want to focus on, or is one statistically preferable over the other? I would be inclined to use CV, to make the selection independent of the expression level.
Thanks!
Clara
Thank you!