Dear all,
I was using sva to detect the hidden batch effects and remove them using linear regression. I felt confused to set mod and mod0 matrix properly in our application, any form of guidance was appreciated.
Setting 1
Suppose I have the covariates i.e. age and gender. According to the tutorial [1], it seems I have to set mod and mod0 matrix as:
mod = model.matrix(~Diagnosis+age+gender, data=metaExp)
mod0 = model.matrix(~age+gender+1, data=metaExp)
#Then the sva algorithm was performed:
SVAsol = sva(dat=t(X), mod=mod, mod0=mod0, n.sv=10)
However, in this way, I found the hidden variables were still significantly correlated to age and gender, where I suppose there should be no correlations.
Setting 2
Alternatively, one of my colleagues suggest to set mod0 matrix as
mod0 = model.matrix(~+1, data=metaExp)
In this way, I do found no correlations between the hidden variables and the known covariates.
Could anyone let me know the appropriate setting of mod and mod0 matrix? Also, could you comment on the difference and influence of these two settings?
1, https://bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf
Regards, Hank