Hello,
I am analyzing RNA-seq data using DESeq2 I have 5 groups of samples each with 4 replicates.
Below is a PCA plot of the VST transformed value: as shown in the PCA plot, replicate 1 (R1) are separate from the others on the upper half of the PCA plot. There are some clustering of the replicate 2 samples as well (triangles) at the bottom.
I ran deseq2 with a design of ~ batch + group
I tried to visualize the PCA plot after batch effect was modeled following this post, using:
log(counts(dds, normalized=TRUE) + 1) - log(t( t(assays(dds)[["mu"]]) / sizeFactors(dds) ) + 1)
I thought some hidden batch effect is still not corrected. So, I decided to run SVA to get two surrogate variables and include that variables into the model as ~ SV1 + SV2 + group
dat <- counts(dds, normalized = TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx, ]
# specify full model which include the variable of interest
mod <- model.matrix(~ group, data=colData(dds))
# null model is the model with only intercept term
mod0 <- model.matrix(~ 1, data=colData(dds))
# coded this way, svseq is the 2 latent variables not associated with group
svseq <- svaseq(dat, mod, mod0, n.sv = 2)
SV1 = svseq[,1]
SV2 = svseq[,2]
However, I think there is still no clear separations between the groups and some batch effects are still there.
Any advice would be much appreciated. Thank you in advance!