Question

Sva and batch effect

0

Entering edit mode

Lenny186 • 0

@d0519f0c

Last seen 2.2 years ago

France

Hi,

I'm trying to understand how sva works and I have two pieces of codes and I want to be sure to understand the differences between them. I added # with my comment, but please if my terminology is wrong or informations are not correct, please correct it so I could understand better :

  Code1:
    library(limma)
    mod <- model.matrix(~sex) #model could be explained by sex
    svafit <- sva(geneExpression,mod)  # we suspect some batch effect but we don't know from which variable it comes, so we use sva
    svaX<-model.matrix(~sex+svafit$sv) # we adjust with the sva news covariates
    lmfit <- lmFit(geneExpression,svaX)# we fit our original data with adjusted model
    tt<- lmfit$coef[,2]*sqrt(lmfit$df.residual)/(2*lmfit$sigma) # we calculated tt and then conclude with pvalues for sex variable
    pvals=2*(1-pt(abs(tt),lmfit$df.residual[1])

    Code2
    library(limma)
    mod <- model.matrix(~sex+batch) # we explain our model with sex and some batch effect
    fit <- lmFit(geneExpression,mod) # we fit our model
    ses <- fit$stdev.unscaled[,2]*fit$sigma
    ttest <- fit$coef[,2]/ses # and conclude tt and then pvalue for sex variable
    pvals <- 2*pt(-abs(ttest),fit$df)

So when it comes to the conclusion : what is the difference between these two codes ? (Warning my answers could really be wrong but this is my understanding) : I would say the first code "remove" batch effect from the model then we can estimate more accurately sex effect. While the second to do not remove it but can allow to estimate the correspondant p-value of the batch effect, in the detrimental of sex effect estimation.

Please correct if wrong (surely :( )

Thanks a lot for your precious help

Lenny

limma BatchEffect sva • 1.2k views

ADD COMMENT • link updated 2.2 years ago by Gordon Smyth 52k • written 2.2 years ago by Lenny186 • 0

score 3 · Accepted Answer · 2023-01-31

3

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 12 hours ago

United States

You shouldn't be computing regular t-statistics to begin with. In addition, sva is intended to adjust for unobserved variability, and a batch effect is clearly not that. You could hypothetically use ComBat if you think there is more to the batch than a shift in means, but in general the second way is how you should fit the model, but by following lmFit with an eBayes and then a topTable to extract the genes.

ADD COMMENT • link 2.2 years ago James W. MacDonald 68k

2

Entering edit mode

(+1). I find that ComBat tends to introduce spurious differential expression so, you think there's a batch effect affecting the variances as well as means, then I'd recommend limma's array weights together with batch in the linear model rather than using ComBat.

ADD REPLY • link 2.2 years ago Gordon Smyth 52k