For RNASeq analysis, I am generating a PCA plot for various strains with three biological replicates each. When I make the PCA plot , I get a symbol on the plot for every replicate. For a large dataset, I was wondering if there is a way to have a single symbol (average of three biological replicates) be represented on the plot, instead of all three replicates.
I've received this question before on the support site, and my answer is that I really don't understand the point of a PCA plot in which you can't see how the samples within a group spread out. I suppose you can compare the distances between 3 or more conditions, but those distances relative to the biological variance are what I'm most interested in seeing in a PCA plot.
If you really want to make this plot despite these shortcoming I've mentioned, you can compute the row-wise average of the transformed values for each condition and make a PCA plot of just the means. The rowMeans() function can be used to for the means of a subset of the data, and cbind() can be used to bind the columns of means from the different groups together.
Thanks Micheal. I complete agree with your reasoning. However, I have 30 samples in triplicates and visualizing the relationship between samples become difficult due to multiple data points. I intend to make both PCA plots, with individual replicates (to see spread within samples) and with average of replicates (spread between samples).
Thanks Micheal. I complete agree with your reasoning. However, I have 30 samples in triplicates and visualizing the relationship between samples become difficult due to multiple data points. I intend to make both PCA plots, with individual replicates (to see spread within samples) and with average of replicates (spread between samples).