Is it a valid option to supply a custom dispersion estimate calculated only from the control group pre-treatment?
Most certainly not. The variability in the treatments is real, dismissing it would be dangerous.
The unsaid question (that Mike touched on) is whether different dispersions are supported for each group. In the distant past, I added some functionality to edgeR to accept a matrix of dispersions - see, for example, the description of the dispersion=
argument in glmFit()
. (To be honest, I don't quite remember why I did this; it was probably something single-cell-related, and I haven't used it since.) This means that you could set up a matrix where, for each gene, all observations from the same group get one dispersion value and all observations in another group get another dispersion.
So it's possible, but that really just kicks the can down the road because you're faced with the problem of trying to estimate these group-specific dispersions. This is... also theoretically possible with the QL machinery in edgeR, but it would involve some experimentation. If you're curious, the general idea would be to (i) split the dataset into each group, (ii) run estimateDisp()
on each subset of samples; (iii) cbind
the trended dispersions together into a matrix, (iv) feed that matrix into glmQLFit()
and (v) hope for the best. Don't treat that as a recommendation, though; I have no idea how or if it will work out.
My standard approach for dealing with this situation would be to use voomWithQualityWeights()
.
P.S. I just noticed the title. If this is meant to be a DESeq2 question, are you just tagging the edgeR maintainers for fun? I'm not sure I like that.