surrogate variables from sva independent of model variables?
0
1
Entering edit mode
pixie ▴ 10
@pixie-13383
Last seen 7.4 years ago

Hi,

I'm trying to get understand the full vs null model specifications in the sva package.
My understanding is the following:

full model = Variable-of-interest + all confounders
null model = all confounders

The resulting SVs should then capture noise that is not captured by my confounders. Is this correct?
What I observe though is that when I run sva, the resulting SVs relate stronger to the confounders, when these were included in the model, vs a scenario, where I didn't include the confounders in the models.

Below a reproducible example using the bladderbatch data. "batch" is in this case a measured confounder (not my variable-of-interest).

library(sva)
library(bladderbatch)
data(bladderdata)
library(pamr)
library(limma)
pheno = pData(bladderEset)
edata = exprs(bladderEset)

mod = model.matrix(~as.factor(cancer), data=pheno)
mod0 = model.matrix(~1,data=pheno)
n.sv = num.sv(edata,mod,method="leek")
n.sv
[1] 2

svobj = sva(edata,mod,mod0,n.sv=n.sv)

Number of significant surrogate variables is:  2 
Iteration (out of 5 ):1  2  3  4  5  

Now, let's see how the derived SVs relate to the batch variable (a covariate not included in the model).

summary(aov(svobj$sv ~ as.factor(pheno$batch)))

 Response 1 :
                       Df  Sum Sq  Mean Sq F value    Pr(>F)    
as.factor(pheno$batch)  4 0.34913 0.087284  6.9734 0.0001426 ***
Residuals              52 0.65087 0.012517                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Response 2 :
                       Df  Sum Sq  Mean Sq F value   Pr(>F)   
as.factor(pheno$batch)  4 0.29016 0.072541  5.3141 0.001153 **
Residuals              52 0.70984 0.013651                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Ok, let's include "batch" as a covariate and see whether the derived SVs are independent of that covariate.

mod = model.matrix(~as.factor(cancer) + as.factor(batch), data=pheno)
mod0 = model.matrix(~as.factor(batch),data=pheno)
svobj = sva(edata,mod,mod0,n.sv=n.sv)

Number of significant surrogate variables is:  2 
Iteration (out of 5 ):1  2  3  4  5  

summary(aov(svobj$sv ~ as.factor(pheno$batch)))

 Response 1 :
                       Df  Sum Sq  Mean Sq F value    Pr(>F)    
as.factor(pheno$batch)  4 0.36353 0.090882   7.425 8.269e-05 ***
Residuals              52 0.63647 0.012240                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Response 2 :
                       Df  Sum Sq  Mean Sq F value    Pr(>F)    
as.factor(pheno$batch)  4 0.42935 0.107339  9.7812 5.631e-06 ***
Residuals              52 0.57065 0.010974                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 

I don't get why the association between the SVs and batch is stronger in the second example, in which batch was included in the models. I thought batch should have been "protected". (note: in my own data, the covariate is age; I just used batch here for reproducibility)

 

Thanks, Esther

sva • 1.5k views
ADD COMMENT
0
Entering edit mode

I have the same problem. Did you manage to find an answer Esther?

ADD REPLY

Login before adding your answer.

Traffic: 669 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6