Hi all,
I have a question regarding to DEseq differential expression
analysis.
In DEseq, is there any way to detect whether the library from one
sample
is totally screwed up?
Or for signal gene, the expression is abnormal in one sample (For
this
situation, do we just abandon this value or modify it)?
Thanks!
best,
Laurie
--
Rui Luo
Lab phone : 310 794-7537
Geschwind Lab
Human Genetics Department
UCLA
[[alternative HTML version deleted]]
Hi Laurie
On 10/20/2010 07:25 AM, Rui Luo wrote:
> I have a question regarding to DEseq differential expression
analysis.
> In DEseq, is there any way to detect whether the library from
one sample
> is totally screwed up?
> Or for signal gene, the expression is abnormal in one sample
(For this
> situation, do we just abandon this value or modify it)?
if you have enough replicates, you can detect an outlier sample from
the
fact that it is markedly different from the rest.
Possible ways to do so:
- Make a heatmap of the samples after performing a variance
stabilizing
transformation on the count data. This is decribed in the DESeq
vignette. The heatmap shows you how "different" each sample is from
each
other samples, and if one sample is very different from its
replicates,
you may want to consider excluding it from analysis.
- Make for each sample an MA plot comparingin it to the "fictive
reference" that I describes in my reply to your other question, as
follows
library(DESeq)
# get an example count data set -- or use your data:
cds <- makeExampleCountDataSet()
# estimate the size factors:
cds <- estimateSizeFactors( cds )
# calculate the gene-wise geometric means
geomeans <- exp( rowMeans( log( counts(cds) ) ) )
# choose the sample we ant to check
j <- 1
# plot the log fold change versus the reference against
# the geometric mean
plot( geomeans, counts(cds)[,j] / geomeans, pch='.', log="xy" )
# Mark the size factor (0 log fold change):
abline( h = sizeFactors(cds)[j] )
An odd sample should stick out by looking different. You could also
take
the geometric mean not over all samples but only over replicate
samples,
or you could simply plot two samples against each other.
Remember that there are also what we call "variance outliers", i.e.,
single genes who vary much more across replicates than the variance
fit
would suggest. The vignette tells you how to recognize them.
Simon
Use "Array Quality Metrics":
Hi Laurie,
To follow up on Simon's suggestion, after variance stabilising
transformation of the counts (this transformation is logarithm-like
for
high counts and square-root-like for low counts), it should be
possible
and instructive to call the 'arrayQualityMetrics' function from the
package of the same name on the data matrix. To do this, it is
probably
easiest to put the transformed data (and the samples metadata) into an
ExpressionSet.
At some point, somebody will hopefully write a more specialised
quality
metrics functionality for this application
Best wishes
Wolfgang.
Il Oct/20/10 11:37 AM, Simon Anders ha scritto:
> Hi Laurie
>
> On 10/20/2010 07:25 AM, Rui Luo wrote:
>> I have a question regarding to DEseq differential expression
analysis.
>> In DEseq, is there any way to detect whether the library from one
sample
>> is totally screwed up?
>> Or for signal gene, the expression is abnormal in one sample (For
this
>> situation, do we just abandon this value or modify it)?
>
> if you have enough replicates, you can detect an outlier sample from
the
> fact that it is markedly different from the rest.
>
> Possible ways to do so:
>
> - Make a heatmap of the samples after performing a variance
stabilizing
> transformation on the count data. This is decribed in the DESeq
> vignette. The heatmap shows you how "different" each sample is from
each
> other samples, and if one sample is very different from its
replicates,
> you may want to consider excluding it from analysis.
>
> - Make for each sample an MA plot comparingin it to the "fictive
> reference" that I describes in my reply to your other question, as
follows
>
> library(DESeq)
>
> # get an example count data set -- or use your data:
> cds <- makeExampleCountDataSet()
>
> # estimate the size factors:
> cds <- estimateSizeFactors( cds )
>
> # calculate the gene-wise geometric means
> geomeans <- exp( rowMeans( log( counts(cds) ) ) )
>
> # choose the sample we ant to check
> j <- 1
>
> # plot the log fold change versus the reference against
> # the geometric mean
> plot( geomeans, counts(cds)[,j] / geomeans, pch='.', log="xy" )
>
> # Mark the size factor (0 log fold change):
> abline( h = sizeFactors(cds)[j] )
>
> An odd sample should stick out by looking different. You could also
take
> the geometric mean not over all samples but only over replicate
samples,
> or you could simply plot two samples against each other.
>
>
> Remember that there are also what we call "variance outliers", i.e.,
> single genes who vary much more across replicates than the variance
fit
> would suggest. The vignette tells you how to recognize them.
>
>
> Simon
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
Hi Rui.
I'll let the DESeq developers respond to your specific question, but I
find that one useful visual is the output of the plotMDS.dge() --
edgeR package.
Cheers,
Mark
On 2010-10-20, at 4:25 PM, Rui Luo wrote:
> Hi all,
> I have a question regarding to DEseq differential expression
analysis.
> In DEseq, is there any way to detect whether the library from one
sample
> is totally screwed up?
> Or for signal gene, the expression is abnormal in one sample (For
this
> situation, do we just abandon this value or modify it)?
> Thanks!
> best,
> Laurie
>
>
> --
> Rui Luo
> Lab phone : 310 794-7537
> Geschwind Lab
> Human Genetics Department
> UCLA
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}