Custom dispersion estimate DESeq2
2
0
Entering edit mode
Sindre ▴ 110
@sindre-6193
Last seen 4.2 years ago

I am curious, for an example design with a lot of conditions, say a control group and a disease group, and two different treatments performed on both groups (eg. pre-treatment, post-treatment 1 and post-treatment 2 values for both the control and disease group).

Let's say dispersion is very different from one condition to another; its higher in the disease group than in the control group and very high in samples after treatment 1 and extremely high after treatment 2. Is it a valid option to supply a custom dispersion estimate calculated only from the control group pre-treatment?

deseq2 edger • 643 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 hour ago
United States

I'll just say, as a matter of software, DESeq2 does not have any support for separate dispersion estimates across group.

ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 14 hours ago
The city by the bay

Is it a valid option to supply a custom dispersion estimate calculated only from the control group pre-treatment?

Most certainly not. The variability in the treatments is real, dismissing it would be dangerous.

The unsaid question (that Mike touched on) is whether different dispersions are supported for each group. In the distant past, I added some functionality to edgeR to accept a matrix of dispersions - see, for example, the description of the dispersion= argument in glmFit(). (To be honest, I don't quite remember why I did this; it was probably something single-cell-related, and I haven't used it since.) This means that you could set up a matrix where, for each gene, all observations from the same group get one dispersion value and all observations in another group get another dispersion.

So it's possible, but that really just kicks the can down the road because you're faced with the problem of trying to estimate these group-specific dispersions. This is... also theoretically possible with the QL machinery in edgeR, but it would involve some experimentation. If you're curious, the general idea would be to (i) split the dataset into each group, (ii) run estimateDisp() on each subset of samples; (iii) cbind the trended dispersions together into a matrix, (iv) feed that matrix into glmQLFit() and (v) hope for the best. Don't treat that as a recommendation, though; I have no idea how or if it will work out.

My standard approach for dealing with this situation would be to use voomWithQualityWeights().

P.S. I just noticed the title. If this is meant to be a DESeq2 question, are you just tagging the edgeR maintainers for fun? I'm not sure I like that.

ADD COMMENT

Login before adding your answer.

Traffic: 816 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6