Question

Applying SVA to RNA-Seq dataset

0

Entering edit mode

L_K • 0

@l_k-14850

Last seen 4.3 years ago

Dear Bioconductor community,

I'm currently working on a RNA-seq differential expression project with a sRNA seq dataset (for miRNA differential expression analysis) and a mRNA seq dataset for (mRNA differential expression analysis). My condition of interest hast three levels with n=8, n=5 and n=6.

Anyhow, the question arose whether I should use SVA to account for potential batch effects or not. It's not that I would expect several severe batch effects (at least during library prep and sequencing). However since it's a non-model organism study (veterinary field) I thought that SVA might maybe account for mixed breeds within the groups or other unknown effects contributing unwanted gene expression variation.

Thus I would love to get your opinion on that question. Is it a problem to apply SVA to a data set with a small sample size?

To investigate the effect of SVA on the dataset I generated two PCAs: one after subtracting the significant surrogate variables (4 SVs were detected) from the dataset .

The PCA plot on the right illustrates the dataset after removal of 4 surrogate variables.

As the resulting differentially expressed genes/miRNAs are unfortunately not a subset of each other but different I really don't know which path to take and how to justify it.

Are there any possible analyses/quality controls I could run to answer my question?

And an additional small question: Would you suggest to add the RNA extraction Day as a covariate in the linear model? (there were always one from each of the three conditions extracted on one day and I have these batch dates)

Thank you very much for your help

-Matt

edit: Code how I subtract the surrogate variables (I use a function Jaffe et al. 2015 published):

cleaningP = function(y, mod, svaobj, P=ncol(mod)) {
X=cbind(mod,svaobj$sv)
Hat=solve(t(X)%*%X)%*%t(X)
beta=(Hat%*%t(y))
cleany=y-t(as.matrix(X[,-c(1:P)])%*%beta[-c(1:P),])
return(cleany)
}
mod = model.matrix(~sex+condition, data =colData(dds2))

cleanp = cleaningP(mat,mod,svseq)

pca <- prcomp(t(cleanp))

DESeq2 sva svaseq batch effect correction • 3.9k views

ADD COMMENT • link 7.2 years ago L_K • 0

score 1 · Answer 1 · 2018-01-24

1

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 6 months ago

Icahn School of Medicine at Mount Sinai…

It looks to me like SVA is helping quite a bit for this data set. I always like to plot the surrogate variables against any known confounding factors (such as RNA extraction date, in your case). If you can show that SVA is capturing the variation due to known confounders, that gives you confidence that SVA is capturing real effects in your data that should be corrected for.

Other things you can plot your SVs against include RNA QC statistics like RIN, total read count, and percent of reads aligned to genes.