Hi All,
I'm trying to draw scaled PCA plot for my sequencing data while using DESeq2. The code, I am using is:
ds <- DESeqDataSetFromMatrix(countData=counts, colData=samples, design=~condition) ds$condition <- relevel(ds$condition, "Ctrl") ds <- ds[rowSums(counts(ds)>1) >= 2, ] #PCA plot # transformation rld_ds <- rlog(ds, blind=FALSE) #regularized log-transformation #PCA plot1 plotPCA(rld_ds, intgroup = c("SampleName")) # PCA plot 2 using ggplot2 (pcadata <- plotPCA(rld_ds, intgroup = c( "condition", "SampleName"), returnData=TRUE)) (percentVar <- round(100 * attr(pcadata, "percentVar")) ggplot(pcadata, aes(PC1, PC2, color=condition, label=SampleName)) + geom_point(size=3) + geom_text(aes(label=SampleName),hjust=.5, vjust=-1, size=3) + xlab(paste0("PC1: ",percentVar[1],"% variance")) + ylab(paste0("PC2: ",percentVar[2],"% variance")) #plot 2
Now, my question:
Are these PCA plots already scaled due to rld transformation? if not, how can I do that?
Thanks,
Sofia
Hi Steve:
Thank you for your quick response.
As I am passing rlog transformation object (rld_ds) to plotPCA which stabilizes variance applied to the raw data, so do I still need to scale my PCAplot with:
pca <- prcomp(rld_ds, center=T, scale=T)
Thank you for your time. I look forward to learning your advice.
Sofia
I think it's best not to scale the data (make all the genes have same variance). I've posted an argument on the support site before about this, but it's impossible to find again. Basically, the rlog or vst put the data on a scale so there are no systematic differences due to the dynamic range. Forcing each gene to have variance one makes it worse (reduces signal and amplifies noise).
Hi Michael:
Thank you very much for your reply. :)
Many thanks,
Sofia
I've tried to find that post several times now, too! Sadly I've had no luck with it either :-/
Is Scaling / normalizing by row in heatmap of DESeq2 it?
Yes! Thanks, bookmarked...
How did you divine that post?
I utilized my l33t Google skillz:
pca plot scale love site:support.bioconductor.org
the search on this site is worthless, so I always just use Google directly with the site qualifier. It's the first hit on the second page (at least based on what Google thinks I want).
nice move -- strangely(?), it's also the it's also the first on the second page for me, too.
Maybe the search box on this site should just urlencode a google query with site:support.bioconductor.org appended to the end and bypass the internal search functionality entirely.
DESeq2 plotPCA and scale