I made a PCA using the rlog matrix from DESEQ and got this plot where one of my sample groups did not group together.
plotPCA(rld, intgroup=c("condition"))
http://s3.postimg.org/whlalmp6b/Rplot03.png
Using the same matrix in Prcomp from r the samples get more clustered.
cruzi.pca <- prcomp(rldMat2,
center = TRUE,
scale. = FALSE)
library(ggbiplot)
g <- ggbiplot(pcobj = cruzi.pca, scale = 1, obs.scale = 1, var.scale = 1,
groups = groups, ellipse = TRUE,
circle = TRUE, var.axes = FALSE)
g <- g + scale_color_discrete(name = '')
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)
http://s4.postimg.org/vdsax2drx/Rplot04.png
How can I decide what plot to use? And Why a same matrix of transformed data got so differently clusted ? Thank you.
Adding on to Mike's comment, it is most likely due to the number of genes you use for the DESeq2::plotPCA function. This number defaults to 500, while you take all the genes in the rldMat2 object - at least, if rld and rldMat2 are exactly the same objects.
Indeed. This explain the difference.
Why I would make the PCA for only 500 genes instead of all of them ?
Making a PCA plot after first ranking the genes by total variance helps to make more clear the sample groupings. Of course, you can tune this parameter, but 500 is a good number for many RNA-seq datasets.
Thanks for the clarifications Michael.