Question

Run DESeq with only a handful of genes

0

Entering edit mode

xinlian.zhang25 • 0

@xinlianzhang25-10779

Last seen 8.9 years ago

Hi! I am interested in doing analysis in a special setting, which is that I only want to run DESeq on a few genes and this is not usually assumed. I run into the following problem.

Here is my sample code.

dds <- makeExampleDESeqDataSet()
dds <- dds[1:3,]
dds <- estimateSizeFactors(dds)
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
res <- results(dds)

For "dds <- estimateDispersions(dds)", I keep getting the following message.

gene-wise dispersion estimates
mean-dispersion relationship
-- note: fitType='parametric', but the dispersion trend was not well captured by the
function: y = a/x + b, and a local regression fit was automatically substituted.
specify fitType='local' or 'mean' to avoid this message next time.
Error in lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth, :
newsplit: out of vertex space
In addition: There were 17 warnings (use warnings() to see them)

Sometimes if this line passes, then for the line "dds <- nbinomWaldTest(dds)", I keep getting

Error in nbinomWaldTest(dds) :
testing requires dispersion estimates, first call estimateDispersions()

Any help will be appreciated. Thanks!

XInlian

deseq2 Small number of genes software error • 1.6k views

ADD COMMENT • link 8.9 years ago xinlian.zhang25 • 0

2

Entering edit mode

How many genes is a small number? Do you actually have data for all genes? If so I'd probably try estimating dispersions genome-wide first, and then subsetting to your genes of interest.

ADD REPLY • link 8.9 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Yes, Ryan is right. For each gene, DESeq2 uses the data from all other genes to improve the estimates of dispersion, even if there is only a small number of replicate experiments. (This is an instance of what's called an empirical Bayes approach; it can only work with sufficient numbers of genes, which in a sense make up for the lack of replication.)

There is no harm in running DESeq2 on all genes, and then subset afterwards. You can limit the multiple testing computations (if needed) to the subset (stats::p.adjust).

ADD REPLY • link 8.9 years ago Wolfgang Huber ★ 13k

score 0 · Answer 1 · 2016-05-26

0

Entering edit mode

xinlian.zhang25 • 0

@xinlianzhang25-10779

Last seen 8.9 years ago

My problem actually is that I want to look at DE analysis on exons in each gene. So what I am doing is to treat an exon as a gene. I know this sounds silly. But for exons in a gene, it is imaginable that i also need to adjust for library size, dispersion. That is why I just want to give it a try.

ADD COMMENT • link 8.9 years ago xinlian.zhang25 • 0

1

Entering edit mode

There is a software designed specifically for this:

http://bioconductor.org/packages/DEXSeq

You should use this instead.

ADD REPLY • link 8.9 years ago Michael Love 43k

0

Entering edit mode

That looks great! Thanks for letting me know!!!

ADD REPLY • link 8.9 years ago xinlian.zhang25 • 0