Given that the PCA plot is likely to change somewhat depending on the number of genes you decide to specify with the ntop parameter, are there any recommendations on how to best set this value besides arbitrarily setting it at the default of 500/1000? Could including all genes have a negative effect if a lot of the genes have low variance?
Thanks!
HI @mikelove. When I increase the ntop value in my data to 1000, 2000, and 3000 respectively my PC1 and PC2 get's worse and worse. I am using VST normalised count (and this happen both at blind=TRUE/FALSE)
Hmm, I wouldn't say the PCs "get worse". They just show you something else. This has to do with the theory of PCA. When we restrict to the top variance genes, PC1 is typically aligned in this direction, so PC1 makes up most of the variance of this subset of the entire space. When we increase the number of genes we look at, the percent of PC1 goes down. This isn't specific to your dataset, but you'd get something similar with data simulated from a multivariate Gaussian distribution with a certain covariance structure.