It was pointed out to us that including the covariates in the correction step can lead to some anti-conservative bias in two-step procedures (where you first clean with ComBat, then do a separate differential expression analysis) so for now we are recommending not including the variables, but we are working on a more complete solution that will allow for the use of the variable of interest and correct the downstream analysis. Stay tuned!
What is the current conclusion to this question? The manual I have (compiled May 22nd 2019) says to include covariate of interest, but then it doesn't seem to be included in the example?
Sorry for the confusion on this. We have been having discussions on how handle covariates in two-step procedures: e.g. (step 1) batch adjustment, followed by (step 2) significance testing.
The proper way to handle a two step batch/significance test is as follows:
- Step 1: Adjust for batch with ComBat and include any adjustment variables, including the covariate of interest.
- Step 2: Use a modified F or T-test for significance. For example:
- The F-test should consist of a modified F statistic=((rss0 - rss1)/(df1 - df0))/(rss1/(n - df1 - nbatches)), where rss0 is the reduced model residual sum of squared error (SSE), rss1 is the full model SSE, df0 and df1 are the numbers of parameters in the reduced and full models, and nbatches is the number of batches. This should be compared against an F distribution with df1 - df0 and n - df1 - nbatches degrees of freedom.
Publications in the literature discussing this issue are forthcoming and we will be changing the sva documentation to reflect this.
Here is the change that introduced this:
https://github.com/jtleek/sva-devel/commit/bdae45f74f6ef2d507939dc840cc17e78fb0a631