Question

sva::ComBat without covariate of interest?

2

Entering edit mode

Brent Pedersen ▴ 110

@brent-pedersen-4815

Last seen 10.4 years ago

United States

Older versions of the SVA manual suggest to include the covariate of
interest in the model when running ComBat -- e.g. 'cancer' in this
manual: http://bioconductor.org/packages/2.9/bioc/vignettes/sva/inst/doc/sva.pdf

The devel and release versions:
http://bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf
say: "Just as with sva, we then need to create a model matrix for the
adjustment variables, but do not include the variable of interest." (emphasis mine).

Is this correct? Including the covariate of interest seems much more
sensible to me.

sva combat • 5.0k views

ADD COMMENT • link updated 10.4 years ago by W. Evan Johnson ▴ 870 • written 10.4 years ago by Brent Pedersen ▴ 110

0

Entering edit mode

Here is the change that introduced this:

https://github.com/jtleek/sva-devel/commit/bdae45f74f6ef2d507939dc840cc17e78fb0a631

ADD REPLY • link 10.4 years ago Brent Pedersen ▴ 110

0

Entering edit mode

W. Evan Johnson ▴ 870

@w-evan-johnson-5447

Last seen 9 months ago

United States

All,

Sorry for the confusion on this. We have been having discussions on how handle covariates in two-step procedures: e.g. (step 1) batch adjustment, followed by (step 2) significance testing.

The proper way to handle a two step batch/significance test is as follows:

- Step 1: Adjust for batch with ComBat and include any adjustment variables, including the covariate of interest.

- Step 2: Use a modified F or T-test for significance. For example:

- The F-test should consist of a modified F statistic=((rss0 - rss1)/(df1 - df0))/(rss1/(n - df1 - nbatches)), where rss0 is the reduced model residual sum of squared error (SSE), rss1 is the full model SSE, df0 and df1 are the numbers of parameters in the reduced and full models, and nbatches is the number of batches. This should be compared against an F distribution with df1 - df0 and n - df1 - nbatches degrees of freedom.

Publications in the literature discussing this issue are forthcoming and we will be changing the sva documentation to reflect this.

Thanks!

Evan

ADD COMMENT • link 10.4 years ago W. Evan Johnson ▴ 870

score 2 · Accepted Answer · 2014-11-17

2

Entering edit mode

Jeff Leek ▴ 650

@jeff-leek-5015

Last seen 4.1 years ago

United States

Hi Brent,

Good eye! Similar question here:

A: ComBat - Including variable of interest in model matrix?

It was pointed out to us that including the covariates in the correction step can lead to some anti-conservative bias in two-step procedures (where you first clean with ComBat, then do a separate differential expression analysis) so for now we are recommending not including the variables, but we are working on a more complete solution that will allow for the use of the variable of interest and correct the downstream analysis. Stay tuned!

Best,

Jeff

ADD COMMENT • link 10.4 years ago Jeff Leek ▴ 650

0

Entering edit mode

Hi,

What is the current conclusion to this question? The manual I have (compiled May 22nd 2019) says to include covariate of interest, but then it doesn't seem to be included in the example?

Many thanks,

Lucy

ADD REPLY • link 5.9 years ago Lucy ▴ 60