I'm running DESeq2_1.6.3 in R_3.1.2 to compare expression levels between two conditions, with a dozen individuals in each condition. The basic code looks like this:
complete_table <- merge(control_table_trimmed, exp_table_trimmed, by=0, all = TRUE)
completeCondition <- data.frame(condition=factor(c(rep("control", length(control_files)), rep("experimental", length(exp_files)))))
dds <- DESeqDataSetFromMatrix(complete_table, completeCondition, ~ condition)
dds <- DESeq(dds)
res <- results(dds, contrast = c("condition", "experimental", "control"))
I haven't had any problems in the past obtaining a results summary table from DESeq2, where the fit is parametric and all levels are compared between the two conditions. However, in my most recent experiment, DESeq2 instead has chosen the mean for my most abundant gene's expression level and set this as identical across all samples, and appears to be comparing all other values to this most abundant gene's expression level.
For example, expression counts for my most abundant gene range from 10,000-30,000. In counts(dds), the expression count for this gene is set to 23,000 for ALL of my individuals, which then skews the rest of my results.
The DESeq2 summary:
baseMean | controlMean | experimentalMean | log2FoldChange | lfcSE | stat | pvalue | padj | |
Candidatus | 2.14E+04 | 21356.92633 | 21356.92633 | -4.96E-11 | 0.5270526 | -9.42E-11 | 1.00E+00 | 1.00E+00 |
Those baseMean, controlMean, and experimentalMean values are incorrect, and should not be identical.
I do see that DESeq2, for this particular dataset, appears to be using a local fit, claiming that the parametric fit fails. This may or may not be at fault, but it's the only deviation in DESeq2's activity from previous runs that I've observed.
Why is DESeq2 suddenly trying to use this single mean value as the baseline against which all other values are apparently compared?