I have previously used the package with succesfully results previously however, in this case I don't manage to understand the results of sva function.
I calculated the number of surrogate variables with both methods provided in num.sv:
> num.sv(v1$E, design, "leek") [1] 2 > num.sv(v1$E, design, "be") [1] 8
I took the lowest number and calculated the surrogate variables, and together with the design matrix I called the sva function:
design <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim = c(60L, 6L), .Dimnames = list(c("A01", "A02", "A03", "A04", "A05", "A06", "A07", "A08", "A09", "B10", "B11", "B12", "B13", "B14", "B15", "B16", "B17", "B18", "C19", "C20", "C21", "C22", "C23", "C24", "C25", "C26", "C27", "C28", "C29", "E41", "E42", "E43", "E44", "E45", "E46", "E47", "E48", "E49", "E50", "E51", "E52", "H72", "H73", "H74", "H75", "H76", "H77", "H78", "H79", "H80", "I89", "I90", "I91", "I92", "I93", "I94", "I95", "I96", "I97", "I98"), c("AH", "Non.responders", "Responders", "C.Comp", "ASH", "Normal"))) sv1 <- sva(v1$E, design, n.sv = 2) However the surrogate variables are: # The sv1$sv is: [,1] [,2] [1,] 1 0 [2,] 0 1 [3,] 0 0 [4,] 0 0 [5,] 0 0 [6,] 0 0 [7,] 0 0 [8,] 0 0 [9,] 0 0 [10,] 0 0 [i,] 0 0
If I increase the number of surrogate variables then, each surrogate affects one sample each following the diagonal. I find quite strange that there is a batch effect for just one sample. I plot the MDS of the samples, and I couldn't observe a batch effect for one of those samples, although if I change the design (where I block each previous column in a single variable) then I get other (more typical) surrogate variables .
Is there something on the design or the way I call sva that would explain this? Or it is simply correct and am I over-thinking it? I think it might be related to sva: No significant surrogate variables
v1 is the result of applying voom to the DGE. v1$E is a matrix with the normalized expression values on the log2 scale, it has the following dimensions: 13366 rows, 60 columns, with values between -7.6 and 17.9.
Maybe I should provide a null matrix, when I did so with `sva(v1$E, design, rep(1, 60), n.sv = 2)` the surrogate variables look normal: