Entering edit mode
Hi guys,
I'm using SVA package for identifying unwanted variation and then removing batch effects from my RNA-Seq data.
Let's say I found my ten SV with sva function; my question is: how can I calculate the proportion of variance explained for each SV, as for PCs in PCA?
Thanks a lot,
Mattia
Thanks Jeff for your precious suggestion. Actually, I tried to do something similar to PCA, using a part of your "sva code":
dat <- data_normalized
pprob <- svaobj$pprob.gam*(1-svaobj$pprob.b)
dats <- dat*pprob
dats <- dats - rowMeans(dats)
uu <- eigen(t(dats)%*%dats)
I added this line:
uu_val <-uu$values / sum(uu$values)
which could express the fraction of each eigenvalue over the total sum. Can I interpret the object uu_val as the "Percent of explained variance" for each SV?
I finally produced the figure where each SV is plotted against the corrisponding uu_val.
x_val <- 1:ncol(dats)
expl_var_plot <- as.data.frame(cbind(x_val,uu_val))
ggplot(expl_var_plot, aes(x_val,uu_val)) +
geom_point(size=3,pch=19, color="blue") +
scale_color_gradient() +
geom_text(aes(label=rownames(expl_var_plot)),hjust=0.5,vjust=2,size=3) +
xlab("SV") +
ylab("uu$values / sum(uu$values)")
Thanks for your time and consideration.
Mattia
Jeff,
I followed your suggestion. I created a simulated RNA-Seq dataset (15000 genes and 30 samples for each of the 2 conditions) with 3 batch effects (strong, medium and weak effects). Then, after VST normalization, I calculated:
PC_res <- prcomp(data_normalized)
expl_var_pca <- (PC_res$sdev)^2 / sum(PC_res$sdev^2)
Then, I calculated the correlation between uu_val2 vs expl_var_pca which resulted equal to 0.95.
Moreover, I found that the first 2 SVs (corresponding to the first 2 uu_val) highly correlate (cor>0.9) with the strong and medium batches, whereas I need 6 SVs to correct the weak batch.
So, it seems to me that uu_val2 behaves like Percentage of Explained variance in PCA.
Thank you again!