Hi,
I'm really new to PCA analysis and just trying to understand how it works and the underlying stats. My problem is that when I run it for all my 3 biological replicates (3 Controls and 3 treatment) all 3 controls seem to correlate well but one of the "treatment" samples is really far off the other 2 (which correlate well with each other). So, I decided to try and re-run it after removing the replicate with the outlier "treatment" sample. This time I get my "treatment" samples correlating well (as they did in the first place) BUT the 2 controls are now far from each other! I'm confused as to why/how this is happening. I thought maybe because I only use 2 replicates that changes the balance between variations, but I'd really appreciate any ideas!!
Thanks so much for your time!!
Katerina
Please read my answer on Biostars: https://www.biostars.org/p/280615/#280634
By the way: you should really consider adding your PCA bi-plot figures to your post, and showing the explained variation along each axis. For one, "far off" is not a term that is quantifiable. It may visually look distant, but, mathematically / statistically, the distance may be virtually meaningless if the explained variation along the axis is only a couple of %
If you perform PCA on 2 samples, they will always appear distant from each other, appearing on opposite sides of the plot. That is the nature of PCA. It does not, however, necessarily imply that the 2 samples exhibit large-scale differences.