Entering edit mode
Hey everyone,
I know the plotPCA
function from DESeq2
uses, by default, only the 500 most variable genes. I was wondering if it makes sense, or if anyone has done, a plot where we check the explained variance by PC1 and PC2 as a function of the number of genes considered.
Something like this:
Where X is the Number of Genes considered and Y is the sum of variance explained by the first 2 PC's.
Thank you in advance!
Indeed, we are constantly changing the amount of variance in our data. But, this was the best way to get a better "grasp" on what would be the right amount of genes that, at that level of variance, a 2 PC PCA plot could better explain that variance - does this make sense? Would you do it in a different way?
I was maybe thinking about including more PC's (up to 3 or 4), which we can then plot in pairwise fashion (PC1 vs PC2, PC1 vs PC3, ...). What do you think?
I mean exploring your data in many ways is always a good idea (here I don't mean doing a bunch of null hypothesis testing, but EDA), can't go wrong.