In my dataset, I have a covariate that I wish to correct for cell
and I also want to incorporate a surrogate variable SV1
for hidden batch effects. My question is if I pass cell
as an adjustment variable to sva
, and generate SV1
, do I now remove cell
from my model design?
# Generate surrogate variable for hidden batch effect
dds = DESeq(dds, parallel=T)
vst = vst(dds)
dat = assay(vst)
mod = model.matrix(~ cell + day + trt, colData(dds))
mod0 = model.matrix(~ cell, colData(dds))
n.sv = num.sv(dat,mod,method="leek") # is 1
svseq = svaseq(dat, mod, mod0, n.sv=n.sv)
dds$SV1 = svseq$sv[,1]
# Is this right?
dds1 = dds
design(dds1) = ~ SV1 + day + trt
dds1 = DESeq(dds1, parallel=T)
vsd1 = vst(dds1)
# Or this?
dds2 = dds
design(dds2) = ~ SV1 + cell + day + trt
dds2 = DESeq(dds2, parallel=T)
vsd2 = vst(dds2)
# having a look see
plotPCA(vst, "cell")
assay(vsd1) = limma::removeBatchEffect(assay(vsd1), batch=vsd1$SV1)
plotPCA(vsd1, "cell") # it definitely modifies cell clustering
assay(vsd2) = limma::removeBatchEffect(assay(vsd2), batch=vsd2$cell,
covariate=vsd2$SV1)
plotPCA(vsd2, "cell") # doesn't look right
Would it be better to not pass cell
to sva as an adjustment variable, and leave it in the model?
mod = model.matrix(~ cell + day + trt, colData(dds))
mod0 = model.matrix(~ 1, colData(dds))
...
design(dds1) = ~ SV1 + cell + day + trt