I have successfully used RUVSeq to correct samples from a "classical" control vs treatment experiment for batch effects using RUVr, RUVg, RUVs and svaseq and all gave similar results, which were satisfactory.
Now I want to use RUVSeq in a clustering problem and I understand I can only use RUVs.
I obtained public RNA-seq from various tissues with replicates and after running RUVs, the resulting PCA doesn't separate samples by tissue, while the rlog'ed uncorrected counts and svaseq corrected counts resulted in the expected clustering by tissue.
My question is: how to use RUVs in clustering problems? My code:
counts_norm = round(counts_deseq, digits=0) differences <- makeGroups(groups) batch_ruv_reps <- RUVs(counts_norm, rownames(counts_norm), k=3, differences) counts_ruvseq = batch_ruv_reps$normalizedCounts # plot PCA using this matrix
groups is a vector of tissue names, counts_deseq is a matrix of counts normalized using DESeq2's rlog function
differences is:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [1,] 1 2 3 -1 -1 -1 -1 -1 [2,] 4 5 -1 -1 -1 -1 -1 -1 [3,] 6 7 8 9 -1 -1 -1 -1 [4,] 23 24 25 26 27 28 29 30 [5,] 10 11 12 -1 -1 -1 -1 -1 [6,] 15 16 17 18 19 20 21 22 [7,] 13 14 -1 -1 -1 -1 -1 -1
I changed k and didn't get better results. Is there anything else I should be doing?
Thank you.
Thanks for your reply. I'm actually using the DESeq2 normalized counts, not rlog, unlike I said. I also tried using raw counts and the betweenLaneNormalization function but got similar results.
I am going to try what you suggested.