Hello all,
I have a few questions about the linear model in limma.
I am doing limma/voom on my expression results, looking for differences between treatment1 and treatment2. I have several batches, so my design is model.matrix(~batch + treatment). My understanding is that with this design, limma first compares treatment1 and treatment2 samples within each batch and then somehow combines the results of different batches together. My first questions:
1. I assume that the linear model gives different batches equal weights when combining the results from each batch, even if batches contain different number of samples, am I right?
Additionally, most of my batches contain both treatment1 and treatment2 samples. Let's call these dual-treatment batches. However, a few batches unfortunately only contain treatment1 samples (or only treatment2 samples). Let's call these uni-treatment batches. Additional questions:
2. Are uni-treatment batches simply dismissed in the linear model calculation with the above design? I don't get any error messages when running lmFit and I get meaningful results from topTable. But I wonder how linear model can run without dismissing the uni-treatment batches. For example, if one batch has 0 treatment1 samples and 3 treatment2 samples, how can limma make a meaningful comparison between treatment1 and treatment2 within that batch?
3. If I manually remove samples in uni-treatment batches ahead of my design (so I end up with fewer batches, all dual-treatment), my topTable results are slightly different from my results in question 2 (slightly different logFC values, moderate differences in p values, and more noticeable differences in AveExpr values for each gene). Overall, for FDR<0.05, I get 826 genes vs 906 genes comparing the two methods. If the linear model simply dismisses uni-treatment batches in its calculation (I don't see how it could not) then I'd think the results under question 2 and question 3 should be the same.
Sorry for the long questions and thanks for your thoughts and answers.
Ali
Thank you Aaron very much for that explanation, which helped my understanding. My earlier understanding came from a simplistic statement in limma vignette under "Blocking" which said "In this type of analysis, the treatments are compared only within each block."
I also thought of another item that potentially adds to the difference I see between results in questions 2 and 3, and that is the normalization factor calculation as well as voom transformation, which are affected when I remove samples from the analysis.
The comment about blocking in the limma User's Guide is correct and does not imply separate analyses for each block. Treatments are compared within each block, and these comparisons are combined, but the combining is done automatically as part of fitting a single linear model. That is how linear models work. If I may say so, it was actually your interpretation rather than the original statement that was too "simplistic".
Yes, normalization changes will affect downstream results if you remove some of the samples.
PS. I have moved you answer to be a comment. For future reference, if you want to reply to Aaron, please use "ADD COMMENT" or "ADD REPLY" rather than posting an answer to your own question.
Thank you Gordon. Point taken. And thanks for clarifying the commenting function.