Entering edit mode
Flavia Nunes
▴
10
@flavia-nunes-5354
Last seen 10.4 years ago
Dear List,
I am trying to use DESeq to analyse a dataset where we have samples of
3
healthy and 3 diseased microbial communities, and we are trying to
establish which OTUs are significantly more or less abundant in the
healthy
vs diseased samples.
I tried running the new version of DESeq (1.8.3) on both a Mac and a
PC
running the latest version of R (2.15). Both versions give a strange
result, where all OTUs have a padj value that is >0.7. I found this
to be
strange, because when looking at the raw count data, it is obvious
that
some OTUs are abundant in the one treatment (say, high counts in all
of the
heathy samples) and absent in the other (0 or close to 0 on all of the
diseased samples).
I asked a colleague to help me with the analysis, and he ran the
analysis
on an older version of DESeq (1.4), using the estimateVarianceFunction
command instead of estimateDispersions. We saw that in the help file
for
estimateDispersions, that by using the sharingMode="fit-only",
fitType="local" options, we should be able to get the same result as
the
estimateVarianceFunction. However, this is not the case. DESeq 1.4
was
able to find 54 OTUs that were significantly different from healthy vs
diseased samples, while DESeq 1.8.3 found that none of the OTUs were
significantly different in healthy vs diseased.
In a second attempt, we used the option method="per-condition" and
this
worked - I got the same 54 significant p-values as in the analysis
with
DESeq 1.4 But when I continued the analysis on other datasets (we
have a
number of different conditions), I again started to get odd p-values,
such
as 1.00 for every OTU. I changed the setting for the
estimateDispersions
command, trying different methods, and each time I would get a
different
set of p-values, but usually very high numbers, close to 1.
It seems to me that the results are really sensitive to the method
used to
estimate dispersions, and I was wondering what are the properties of
the
data that I might have to look for in order to select the best method.
Another unusual thing that I have noticed is that when I plot the
Dispersion Estimates, the fit line deviates from the points towards
the
right side of the graph. This suggests to me that there must be
something
wrong with the fit estimate, but I do not know how I might be able to
change the settings to get a better fit.
I wanted to know if anyone on the list has come across a similar
problem?
I am using the commands below in DESeq. I can provide files of the
data,
as well as the results that I am receiving to anyone that might be
interested in taking a closer look.
WBDCountTable <- read.table( file.choose(), header=TRUE, row.names=1 )
WBDDesign <- data.frame(row.names = colnames( WBDCountTable ),
condition =
c( "D1", "D2", "D3", "H1", "H2", "H3"), libType = c( "single-end",
"single-end", "single-end", "single-end", "single-end", "single-end" )
)
conds <- factor( c( "D", "D", "D", "H", "H", "H" ) )
cds <- newCountDataSet( WBDCountTable, conds )
cds <- estimateSizeFactors( cds )
cds <- estimateDispersions( cds, method="per-condition",
fitType="local" )
plotDispEsts <- function( cds )
{
plot(
rowMeans( counts( cds ) ),
fitInfo(cds)$perGeneDispEsts,
pch = 16, cex=1, log="xy" )
xg <- 10^seq( -.5, 5, length.out=300 )
lines( xg, fitInfo(cds)$dispFun( xg ), col="red" , lwd=3)
}
res <- nbinomTest( cds, "D", "H" )
plotDE <- function( res )
plot(res$baseMean, res$log2FoldChange, log="x", pch=20, cex=.3, col =
ifelse( res$padj < .1, "red", "black" ) )
plotDE( res )
res
--
Flávia Nunes
EMBO Postdoctoral Fellow
Laboratory of Artificial & Natural Evolution
Department of Genetics & Evolution
University of Geneva
Sciences III, 30 quai Ernest Ansermet
1211 Geneva 4 Switzerland
<flavia.nunes@unige.ch>
[[alternative HTML version deleted]]