Question about PCA and transformed data in DESeq2
1
1
Entering edit mode
@amandinefournierchu-lyonfr-5921
Last seen 10.2 years ago
Dear Michael, Simon, Wolfgang and others, I am a little bit confused about the count data transformations and the Principal Component Analysis in DESeq2. In the last vignette, the example on pages 18-19 shows a PCA plot of the samples, obtained with regularized log transformed data (rld). But in the plotPCA R documentation, it is written to use a SummarizedExperiment with transformed data produced by ?varianceStabilizingTransformation? (vst). This is quite discrepant, so I wonder which type of transformation I should use. Moreover, when applied to my real dataset (one group of 2 patients and another group of 2 control cases), I see the following : - when no transformation is applied, axis 1 = pathology (patients vs control cases) and axis 2 = unknown factor - when transformed with r-log (rld), axis 1 = unknown factor and axis 2 = pathology - when transformed with variance (vst), axis 1 = sex (girls vs boys), axis 2 = unknown factor So, I wonder if the data are driven by the pathology or by the sex of the subjects ? Is it incorrect to use untransformed data in PCA ? I don't really understand the usefulness of transforming the data since, as far as I understand, it is not used in DE analysis afterwards. Thank you in advance for your reply. Best regards, Amandine ----- Amandine Fournier Lyon Neuroscience Research Center and Lyon Civil Hospitals (France)
DESeq2 DESeq2 • 1.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 6 days ago
United States
hi Amandine, On Oct 10, 2013 5:17 AM, <amandine.fournier@chu-lyon.fr> wrote: > > > Dear Michael, Simon, Wolfgang and others, > > I am a little bit confused about the count data transformations and the Principal Component Analysis in DESeq2. > > In the last vignette, the example on pages 18-19 shows a PCA plot of the samples, obtained with regularized log transformed data (rld). > But in the plotPCA R documentation, it is written to use a SummarizedExperiment with transformed data produced by ‘varianceStabilizingTransformation’ (vst). > This is quite discrepant, so I wonder which type of transformation I should use. Thanks for pointing this out. I will fix this plotPCA manual page. The function is written to use any SummarizedExperiment object, produced by either function. > > Moreover, when applied to my real dataset (one group of 2 patients and another group of 2 control cases), I see the following : > - when no transformation is applied, axis 1 = pathology (patients vs control cases) and axis 2 = unknown factor > - when transformed with r-log (rld), axis 1 = unknown factor and axis 2 = pathology > - when transformed with variance (vst), axis 1 = sex (girls vs boys), axis 2 = unknown factor > The order of the principal components can change with slight fluctuations in the data, so this is not necessary an indication of something wrong. If PC1 explains 30% of variance and PC2 explains 29%, it is easy for the order to swap. > So, I wonder if the data are driven by the pathology or by the sex of the subjects ? Is it incorrect to use untransformed data in PCA ? > I don't really understand the usefulness of transforming the data since, as far as I understand, it is not used in DE analysis afterwards. The usefulness is in order to examine the samples for outliers. With the untransformed counts, the variance is dominated by a few large counts. With log or shifted log, a lot of variance can come from low count genes. The transformations help to compare samples with priority on genes which are (hopefully) more biologically relevant and not due to technical artifact or "shot noise". Mike > > Thank you in advance for your reply. > Best regards, > Amandine > > ----- > Amandine Fournier > Lyon Neuroscience Research Center > and Lyon Civil Hospitals (France) > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 546 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6