In the dream vignette it's described that for an RNA-seq repeated measure design the limma analysis needs two duplicateCorrelation
rounds, though this differs from the limma user's guide and manual. Below for convenience is the design:
> metadata
Individual Disease Experiment Sex DiseaseSubtype
sample_01 ID1 0 sample_01 F 0
sample_02 ID1 0 sample_02 F 0
sample_03 ID2 0 sample_03 M 0
sample_04 ID2 0 sample_04 M 0
sample_05 ID3 0 sample_05 F 0
sample_06 ID3 0 sample_06 F 0
sample_07 ID4 0 sample_07 M 0
sample_08 ID4 0 sample_08 M 0
sample_09 ID5 0 sample_09 F 0
sample_10 ID5 0 sample_10 F 0
sample_11 ID6 0 sample_11 M 0
sample_12 ID6 0 sample_12 M 0
sample_13 ID7 1 sample_13 F 1
sample_14 ID7 1 sample_14 F 1
sample_15 ID8 1 sample_15 M 1
sample_16 ID8 1 sample_16 M 1
sample_17 ID9 1 sample_17 F 1
sample_18 ID9 1 sample_18 F 1
sample_19 ID10 1 sample_19 M 2
sample_20 ID10 1 sample_20 M 2
sample_21 ID11 1 sample_21 F 2
sample_22 ID11 1 sample_22 F 2
sample_23 ID12 1 sample_23 M 2
sample_24 ID12 1 sample_24 M 2
and the limma two-round duplicateCorrelation
analysis code from the vignette, where the DE test is an ordinary two-group comparison ~Disease
:
# apply duplicateCorrelation is two rounds
design = model.matrix( ~ Disease, metadata)
vobj_tmp = voom( geneExpr, design, plot=FALSE)
dupcor <- duplicateCorrelation(vobj_tmp,design,block=metadata$Individual)
# run voom considering the duplicateCorrelation results
# in order to compute more accurate precision weights
# Otherwise, use the results from the first voom run
vobj = voom( geneExpr, design, plot=FALSE, block=metadata$Individual, correlation=dupcor$consensus)
# Estimate linear mixed model with a single variance component
# Fit the model for each gene,
dupcor <- duplicateCorrelation(vobj, design, block=metadata$Individual)
# But this step uses only the genome-wide average for the random effect
fitDupCor <- lmFit(vobj, design, block=metadata$Individual, correlation=dupcor$consensus)
# Fit Empirical Bayes for moderated t-statistics
fitDupCor <- eBayes( fitDupCor )
This differs from the limma user's guide which, for similar designs and DE test questions, describes only one duplicateCorrelation
round. Also, the limma manual says that specifying parameters like block
and correlation
in voom
only passes them to lmFit
and voom
doesn't use it for precision weights calculation. This would also make sense for only needing one round. Which way is the correct way?
I compared the vignette example with two rounds and the limma guide with one round and the p-values are not identical, the one-round has slightly higher p-values and slightly lower log-odds.
Might be a good idea for limma team, when they have the time, to update the relevant areas in the limma user's guide with this new recommended double approach and also in the limma reference manual
voom
section regarding...
additional parameters, because it made it seem setting parameters likeblock
andcorrelation
would only get them passed tolmFit
not actually being used in precision weights calc invoom
.I can't speak for the limma team and what they will or will not update, but you seem confused.
The fact that they are passed to
lmFit
, and hence use a linear mixed model instead of a conventional ANOVA is how the block and correlation arguments affect the estimation of precision weights. That's the whole point of the two-step approach (see the post by Bernd Klaus in the thread I reference above). So when the help page saysThat's completely accurate. One could argue that there could be other information about why you might want to do the two-step thing when using a linear mixed model, but there's always the tension between adding all possible relevant information in the age of TL;DR, and being concise because people won't read things.
My apologies yes I found that phrase confusing, since I thought it meant that passing
block
andcorrelation
options tovoom
would mean they weren't actually used there and simply passed to the subsequent call oflmFit(v, ...)
with thevoom
result so you wouldn't need to specifyblock
andcorrelation
again. But I see in the examples you linked above you passblock
andcorrelation
to bothvoom
andlmFit
for the second round sovoom
is using them not simply passing them.Yes. If you want
lmFit
to use a mixed model you have to supply ablock
andcorrelation
argument. That's just how it works (e.g., that's how you telllmFit
that you want a mixed model).The
voom
function is only used to estimate observation-level weights so you can use conventional linear models (which is whatlmFit
does), and the two-step procedure is simply done to update the weights to be more amenable to the mixed model rather than based on a fixed effects only model.It's true that if you have values in the 'weights' component of an EList then
lmFit
will automatically fit a weighted linear model, so one might assume that paradigm extends to other things like fitting a mixed model because of something thatvoom
might return.However, what
lmFit
does is based on whatgetEAWP
provides it, and there is documentation for that function (and the help page for the EList class, which is whatvoom
returns), so there's no need to assume anything - you can just read those help pages along with the one forlmFit
and know for sure what's up.Leandro, it is a good start that you are reading the manual help page for
voom
, but you need to read the "Value" section of the page as well. You can see from this section thatvoom()
only returns expression values and weights. It does not return blocks or correlations, so these could not possibly be passed on to any downstream functions. There is no way in R to pass on any information that is not stored in the output object.Very sorry, I do usually read the return value description I simply forgot in this case. Thank you for the help!