batch effect : comBat or blocking in limma ?
3
1
Entering edit mode
@eleonoregravier-8219
Last seen 2.5 years ago
France

Hi bioC community,

Until now I used comBat to remove the batch effects present on my datasets.

After reading the article "Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses" http://biostatistics.oxfordjournals.org/content/early/2015/08/31/biostatistics.kxv027.full.pdf, I decided to block for batch effect in limma in the case where the groups are distributed between the batches in an unbalanced manner.

But I am wondering what is recommended when the groups are balanced between the batches : use combat and then limma without taking into account the batch effect or introduce the batch variable in an additive manner (blocking) in the limma analysis ? Which approach is the more powerful ? Do you have recommendations to give ?

 

Thanks in advance for your help,

Eléonore

batch effect combat limma block • 9.3k views
ADD COMMENT
2
Entering edit mode
@ryan-c-thompson-5618
Last seen 3 months ago
Icahn School of Medicine at Mount Sinai…

In general, I would prefer to model batch effects by including them in the design matrix and modeling the unadjusted data. This ensures that the linear model properly accounts for the degress of freedom associated with modeling the batch effect, so that it uses the proper number of residual df. This prevents the linear model from overestimating significance because it thinks there are more residual df than there really are.

Although I believe you could also fix batch effects though some other method (e.g. ComBat) and then manually subtract the appropriate number of resifual df from the limma fit object before running topTable. I haven't actually tried this, though.

ADD COMMENT
0
Entering edit mode

Thanks Efstathios and Ryan, your answer is very helpful for me,

Best

Eléonore

 

ADD REPLY
2
Entering edit mode
@w-evan-johnson-5447
Last seen 7 months ago
United States

Here is my opinion on the matter: 

1. First off, I want to point out one major flaw in the paper you mentioned: while it is true that ComBat adjustment does introduce a systematic correlation structure in the data (any linear model based approach will!)--the problem is not ComBat, but rather in the assumption that the data are independent after adjustment. So it is NOT appropriate to apply a standard t-test or Limma after adjustment with ComBat. Instead, one should apply an approach for correlated data, such as weighted least squares.  This paper misrepresents the true problem for the sake of being controversial. 

2. For balanced batch designs, this is not really a problem, because the correlation structure induced by linear modeling is orthogonal to the treatment variable of interest. However, for unbalanced designs, the correlation structure is associated with the treatment variable, so the impact of ignoring correlation in the data is much more severe.

3. For straightforward tasks, such differential expression, I would recommend blocking in Limma. It is usually better to do a one-step procedure than a two-step.

4. The one exception is as follows: the main difference between what Limma does and ComBat is that ComBat adjusts for differences in both the mean and variance differences across the batches, whereas Limma (I believe--Gordon please confirm) assumes that the batch variances are the same and only accounts for mean differences across the batches. So if there are large differences in batch variances, it might still be better to use ComBat. If there are not large variance differences, then Limma should be the best. 

Thanks!

Evan

ADD COMMENT
1
Entering edit mode

In limma, you can use the arrayWeights function to account for differences in variance or sample quality between samples or groups.

ADD REPLY
0
Entering edit mode

Thanks a lot Dr Johnson for all these details,

Best,

Eléonore

 

ADD REPLY
0
Entering edit mode

Dear Dr Johnson, 

your answer is very detailed and comprehensive, but as a beginner i would like to discuss the possibility of using Combat and then Limma. Because -both from the MOOC link i posted above(which is introductory but has a very nice part for batch effects), and also both from the papers about Combat and others regarding batch effect- i understand that "naively" the main effect of Combat is a drastic adjustment  of mean and variances("shrinkage")-However-and here your opinion and of course and any others specialists of the community would be crusial for the discussion-, there is no much in literature about downstream of analysis and when or not should limma be used in conjuction with Combat.

I understand that as an expert and creator of the methodology, you provided the details and explanations about this specific issue-BUT Combat can't be used always with limma ? For istance, in one of my recent studies, i used Combat correction(from the package inSilicoMerging) to correct batch effect regarding the merging of two datasets based on their common probesets. Vsrious  EDA plots clearly showed that with Combat the studies are mixed well, when with no tranformation there was a strong batch effect. Then i used limma moderated paired test to detect differential expression. Thus, in this specific case, as i have different variances between the studies, as also there is the problem of the heteroscedacity of the tumors, should i again dont use limma after Combat ?? Or the issue is irrelevant of this and has to do with the  "degrees of freedom" and the "overestimation of DE" ?

Best,

Efstathios

ADD REPLY
1
Entering edit mode
svlachavas ▴ 840
@svlachavas-7225
Last seen 6 days ago
Germany/Heidelberg/German Cancer Resear…

Dear Eleonore,

firtstly i naively believe that is very crusial to use some "diagnostic plots", like plotMDS() from limma or even a hierarchical clustering, to see how your batches are "clustered" and grouped together prior any statistical analysis. Then, assuming for a "hypothetical case" that if your "batch" is "known", the various plots dont show a "very strong effect" regarding a specific batch -and also the batch is distributed in a balanced way-, then the most "naive" and efficient way is to include the batch variable as a factor in your model matrix(in the way you mention). Nevertheless, your question is very general, and it is strongly connected with your experimental design and how you have pre-processed your data. Thus, in my opinion there is "no gold standard" regarding "batch effect correction" and depends on your specific analysis. You can also check the MOOC on Coursera (https://class.coursera.org/genstats-004) which adress the matter in more detail

Best,

Efstathios

ADD COMMENT

Login before adding your answer.

Traffic: 825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6