Dear Bioconductor community,
I'm currently working on a RNA-seq differential expression project with a sRNA seq dataset (for miRNA differential expression analysis) and a mRNA seq dataset for (mRNA differential expression analysis). My condition of interest hast three levels with n=8, n=5 and n=6.
Anyhow, the question arose whether I should use SVA to account for potential batch effects or not. It's not that I would expect several severe batch effects (at least during library prep and sequencing). However since it's a non-model organism study (veterinary field) I thought that SVA might maybe account for mixed breeds within the groups or other unknown effects contributing unwanted gene expression variation.
Thus I would love to get your opinion on that question. Is it a problem to apply SVA to a data set with a small sample size?
To investigate the effect of SVA on the dataset I generated two PCAs: one after subtracting the significant surrogate variables (4 SVs were detected) from the dataset .
The PCA plot on the right illustrates the dataset after removal of 4 surrogate variables.
As the resulting differentially expressed genes/miRNAs are unfortunately not a subset of each other but different I really don't know which path to take and how to justify it.
Are there any possible analyses/quality controls I could run to answer my question?
And an additional small question: Would you suggest to add the RNA extraction Day as a covariate in the linear model? (there were always one from each of the three conditions extracted on one day and I have these batch dates)
Thank you very much for your help
-Matt
edit: Code how I subtract the surrogate variables (I use a function Jaffe et al. 2015 published):
cleaningP = function(y, mod, svaobj, P=ncol(mod)) {
X=cbind(mod,svaobj$sv)
Hat=solve(t(X)%*%X)%*%t(X)
beta=(Hat%*%t(y))
cleany=y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),])
return(cleany)
}
mod = model.matrix(~sex+condition, data =colData(dds2))
cleanp = cleaningP(mat,mod,svseq)
pca <- prcomp(t(cleanp))
Thank you very much for your input! Highly appreciated.