I am working with an RNA-seq dataset in which I need to use an offset matrix, and I've noticed that some functions, such as voom and cpm, don't seem to accept an offset matrix, nor do they use an offset matrix if it is present in the DGEList object. Is this an oversight or is there a specific reason that these functions should not use offsets?
That's an issue that we've debated internally, at some length. We decided not to do it for convenience and safety. For example, for aveLogCPM; once we add an offset matrix to a DGEList, should the existing vector of average abundances (if it exists) be recomputed? This would be a bother, as we'd have to intercept the assignment to the offset member, to notify the other DGEList elements that they're out of date.
More importantly, does the scale of the average abundances make sense for arbitrary offsets (and ditto for the cpm function)? This is especially problematic, as you can change the magnitude of the average abundance by changing the size of the offsets. It's not immediately obvious that this would have an effect, as you're not changing their relative values between libraries (which is what is important for GLM fitting). However, if the relative sizes of the average abundances are altered across genes, this would interfere with estimation of the mean-dispersion trend in edgeR and of the mean-variance trend in voom.
I guess you could avoid that by zero-centering the offsets for each gene prior to analysis, though I'm not sure how effective a solution that is. So, in short, we decided not to make those functions responsive to user-supplied offsets, to reduce the number of things that could stuff up. I guess we could add it as an option if there's a pressing need for it.
For what it's worth, I've implemented both cpm and voom with offsets, and when I use them with my offsets, which do happen to be zero-centered, I get reasonable results. I have not tested with non-centered offsets, and I admit I have not thought at all about offsets that would change the relative average abundance of genes.
(Also, on a related note, I noticed that the cpm calculation in voom is slightly different than calling cpm with prior.count=0.5 and log=TRUE. Is this intentional?)
For what it's worth, I've implemented both cpm and voom with offsets, and when I use them with my offsets, which do happen to be zero-centered, I get reasonable results. I have not tested with non-centered offsets, and I admit I have not thought at all about offsets that would change the relative average abundance of genes.
(Also, on a related note, I noticed that the cpm calculation in voom is slightly different than calling cpm with prior.count=0.5 and log=TRUE. Is this intentional?)
See Differences between limma voom E values and edgeR cpm values?