Outlier detection in DEseq
2
0
Entering edit mode
Rui Luo ▴ 20
@rui-luo-4306
Last seen 10.2 years ago
Hi all, I have a question regarding to DEseq differential expression analysis. In DEseq, is there any way to detect whether the library from one sample is totally screwed up? Or for signal gene, the expression is abnormal in one sample (For this situation, do we just abandon this value or modify it)? Thanks! best, Laurie -- Rui Luo Lab phone : 310 794-7537 Geschwind Lab Human Genetics Department UCLA [[alternative HTML version deleted]]
Genetics DESeq Genetics DESeq • 2.8k views
ADD COMMENT
0
Entering edit mode
Simon Anders ★ 3.8k
@simon-anders-3855
Last seen 4.3 years ago
Zentrum für Molekularbiologie, Universi…
Hi Laurie On 10/20/2010 07:25 AM, Rui Luo wrote: > I have a question regarding to DEseq differential expression analysis. > In DEseq, is there any way to detect whether the library from one sample > is totally screwed up? > Or for signal gene, the expression is abnormal in one sample (For this > situation, do we just abandon this value or modify it)? if you have enough replicates, you can detect an outlier sample from the fact that it is markedly different from the rest. Possible ways to do so: - Make a heatmap of the samples after performing a variance stabilizing transformation on the count data. This is decribed in the DESeq vignette. The heatmap shows you how "different" each sample is from each other samples, and if one sample is very different from its replicates, you may want to consider excluding it from analysis. - Make for each sample an MA plot comparingin it to the "fictive reference" that I describes in my reply to your other question, as follows library(DESeq) # get an example count data set -- or use your data: cds <- makeExampleCountDataSet() # estimate the size factors: cds <- estimateSizeFactors( cds ) # calculate the gene-wise geometric means geomeans <- exp( rowMeans( log( counts(cds) ) ) ) # choose the sample we ant to check j <- 1 # plot the log fold change versus the reference against # the geometric mean plot( geomeans, counts(cds)[,j] / geomeans, pch='.', log="xy" ) # Mark the size factor (0 log fold change): abline( h = sizeFactors(cds)[j] ) An odd sample should stick out by looking different. You could also take the geometric mean not over all samples but only over replicate samples, or you could simply plot two samples against each other. Remember that there are also what we call "variance outliers", i.e., single genes who vary much more across replicates than the variance fit would suggest. The vignette tells you how to recognize them. Simon
ADD COMMENT
0
Entering edit mode
Use "Array Quality Metrics": Hi Laurie, To follow up on Simon's suggestion, after variance stabilising transformation of the counts (this transformation is logarithm-like for high counts and square-root-like for low counts), it should be possible and instructive to call the 'arrayQualityMetrics' function from the package of the same name on the data matrix. To do this, it is probably easiest to put the transformed data (and the samples metadata) into an ExpressionSet. At some point, somebody will hopefully write a more specialised quality metrics functionality for this application Best wishes Wolfgang. Il Oct/20/10 11:37 AM, Simon Anders ha scritto: > Hi Laurie > > On 10/20/2010 07:25 AM, Rui Luo wrote: >> I have a question regarding to DEseq differential expression analysis. >> In DEseq, is there any way to detect whether the library from one sample >> is totally screwed up? >> Or for signal gene, the expression is abnormal in one sample (For this >> situation, do we just abandon this value or modify it)? > > if you have enough replicates, you can detect an outlier sample from the > fact that it is markedly different from the rest. > > Possible ways to do so: > > - Make a heatmap of the samples after performing a variance stabilizing > transformation on the count data. This is decribed in the DESeq > vignette. The heatmap shows you how "different" each sample is from each > other samples, and if one sample is very different from its replicates, > you may want to consider excluding it from analysis. > > - Make for each sample an MA plot comparingin it to the "fictive > reference" that I describes in my reply to your other question, as follows > > library(DESeq) > > # get an example count data set -- or use your data: > cds <- makeExampleCountDataSet() > > # estimate the size factors: > cds <- estimateSizeFactors( cds ) > > # calculate the gene-wise geometric means > geomeans <- exp( rowMeans( log( counts(cds) ) ) ) > > # choose the sample we ant to check > j <- 1 > > # plot the log fold change versus the reference against > # the geometric mean > plot( geomeans, counts(cds)[,j] / geomeans, pch='.', log="xy" ) > > # Mark the size factor (0 log fold change): > abline( h = sizeFactors(cds)[j] ) > > An odd sample should stick out by looking different. You could also take > the geometric mean not over all samples but only over replicate samples, > or you could simply plot two samples against each other. > > > Remember that there are also what we call "variance outliers", i.e., > single genes who vary much more across replicates than the variance fit > would suggest. The vignette tells you how to recognize them. > > > Simon > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Mark Robinson ★ 1.1k
@mark-robinson-2171
Last seen 10.2 years ago
Hi Rui. I'll let the DESeq developers respond to your specific question, but I find that one useful visual is the output of the plotMDS.dge() -- edgeR package. Cheers, Mark On 2010-10-20, at 4:25 PM, Rui Luo wrote: > Hi all, > I have a question regarding to DEseq differential expression analysis. > In DEseq, is there any way to detect whether the library from one sample > is totally screwed up? > Or for signal gene, the expression is abnormal in one sample (For this > situation, do we just abandon this value or modify it)? > Thanks! > best, > Laurie > > > -- > Rui Luo > Lab phone : 310 794-7537 > Geschwind Lab > Human Genetics Department > UCLA > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ------------------------------ Mark Robinson, PhD (Melb) Epigenetics Laboratory, Garvan Bioinformatics Division, WEHI e: m.robinson at garvan.org.au e: mrobinson at wehi.edu.au p: +61 (0)3 9345 2628 f: +61 (0)3 9347 0852 ------------------------------ ______________________________________________________________________ The information in this email is confidential and intend...{{dropped:6}}
ADD COMMENT

Login before adding your answer.

Traffic: 830 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6