Hi! I am interested in doing analysis in a special setting, which is that I only want to run DESeq on a few genes and this is not usually assumed. I run into the following problem.
Here is my sample code.
dds <- makeExampleDESeqDataSet()
dds <- dds[1:3,]
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
res <- results(dds)
For "dds <- estimateDispersions(dds)", I keep getting the following message.
gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
Error in lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth, :
newsplit: out of vertex space
In addition: There were 17 warnings (use warnings() to see them)
Sometimes if this line passes, then for the line "dds <- nbinomWaldTest(dds)", I keep getting
Error in nbinomWaldTest(dds) :
testing requires dispersion estimates, first call estimateDispersions()
Any help will be appreciated. Thanks!
XInlian
How many genes is a small number? Do you actually have data for all genes? If so I'd probably try estimating dispersions genome-wide first, and then subsetting to your genes of interest.
Yes, Ryan is right. For each gene,
DESeq2
uses the data from all other genes to improve the estimates of dispersion, even if there is only a small number of replicate experiments. (This is an instance of what's called an empirical Bayes approach; it can only work with sufficient numbers of genes, which in a sense make up for the lack of replication.)There is no harm in running
DESeq2
on all genes, and then subset afterwards. You can limit the multiple testing computations (if needed) to the subset (stats::p.adjust
).