Surrogate variables affecting just one sample?
2
0
Entering edit mode
@lluis-revilla-sancho
Last seen 4 weeks ago
European Union

I have previously used the package with succesfully results previously however, in this case I don't manage to understand the results of sva function.

I calculated the number of surrogate variables with both methods provided in num.sv:

> num.sv(v1$E, design, "leek")
[1] 2
> num.sv(v1$E, design, "be")
[1] 8

I took the lowest number and calculated the surrogate variables, and together with the design matrix I called the sva function:

design <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), .Dim = c(60L, 6L), .Dimnames = list(c("A01",
"A02", "A03", "A04", "A05", "A06", "A07", "A08", "A09", "B10",
"B11", "B12", "B13", "B14", "B15", "B16", "B17", "B18", "C19",
"C20", "C21", "C22", "C23", "C24", "C25", "C26", "C27", "C28",
"C29", "E41", "E42", "E43", "E44", "E45", "E46", "E47", "E48",
"E49", "E50", "E51", "E52", "H72", "H73", "H74", "H75", "H76",
"H77", "H78", "H79", "H80", "I89", "I90", "I91", "I92", "I93",
"I94", "I95", "I96", "I97", "I98"), c("AH", "Non.responders",
"Responders", "C.Comp", "ASH", "Normal")))
sv1 <- sva(v1$E, design, n.sv = 2)
However the surrogate variables are:
# The sv1$sv  is:
      [,1] [,2]
 [1,]    1    0
 [2,]    0    1
 [3,]    0    0
 [4,]    0    0
 [5,]    0    0
 [6,]    0    0
 [7,]    0    0
 [8,]    0    0
 [9,]    0    0
[10,]    0    0
[i,]    0    0

If I increase the number of surrogate variables then, each surrogate affects one sample each following the diagonal. I find quite strange that there is a batch effect for just one sample. I plot the MDS of the samples, and I couldn't observe a batch effect for one of those samples, although if I change the design  (where I block each previous column in a single variable) then I get other (more typical) surrogate variables .

Is there something on the design or the way I call sva that would explain this? Or it is simply correct and am I over-thinking it? I think it might be related to sva: No significant surrogate variables

sva • 1.2k views
ADD COMMENT
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.9 years ago
United States
Something very strange is going on. What are the dimensions and type of data in V1$E? On Mon, Jan 9, 2017 at 11:02 AM Lluís R [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Lluís R <https: support.bioconductor.org="" u="" 7250=""/> wrote Question: > Surrogate variables affecting just one sample? > <https: support.bioconductor.org="" p="" 90969=""/>: > > I have previously used the package with succesfully results previously > however, in this case I don't manage to understand the results of sva > function. > > I calculated the number of surrogate variables with both methods provided > in num.sv: > > > num.sv(v1$E, design, "leek") > [1] 2 > > num.sv(v1$E, design, "be") > [1] 8 > > I took the lowest number and calculated the surrogate variables, and > together with the design matrix I called the sva function: > > design <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, > 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, > 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, > 1, 1, 1, 1, 1, 1, 1), .Dim = c(60L, 6L), .Dimnames = list(c("A01", > "A02", "A03", "A04", "A05", "A06", "A07", "A08", "A09", "B10", > "B11", "B12", "B13", "B14", "B15", "B16", "B17", "B18", "C19", > "C20", "C21", "C22", "C23", "C24", "C25", "C26", "C27", "C28", > "C29", "E41", "E42", "E43", "E44", "E45", "E46", "E47", "E48", > "E49", "E50", "E51", "E52", "H72", "H73", "H74", "H75", "H76", > "H77", "H78", "H79", "H80", "I89", "I90", "I91", "I92", "I93", > "I94", "I95", "I96", "I97", "I98"), c("AH", "Non.responders", > "Responders", "C.Comp", "ASH", "Normal"))) > sv1 <- sva(v1$E, design, n.sv = 2) > However the surrogate variables are: > # The sv1$sv is: > [,1] [,2] > [1,] 1 0 > [2,] 0 1 > [3,] 0 0 > [4,] 0 0 > [5,] 0 0 > [6,] 0 0 > [7,] 0 0 > [8,] 0 0 > [9,] 0 0 > [10,] 0 0 > [11,] 0 0 > [12,] 0 0 > [13,] 0 0 > [14,] 0 0 > [15,] 0 0 > [16,] 0 0 > [17,] 0 0 > [18,] 0 0 > [19,] 0 0 > [20,] 0 0 > [21,] 0 0 > [22,] 0 0 > [23,] 0 0 > [24,] 0 0 > [25,] 0 0 > [26,] 0 0 > [27,] 0 0 > [28,] 0 0 > [29,] 0 0 > [30,] 0 0 > [31,] 0 0 > [32,] 0 0 > [33,] 0 0 > [34,] 0 0 > [35,] 0 0 > [36,] 0 0 > [37,] 0 0 > [38,] 0 0 > [39,] 0 0 > [40,] 0 0 > [41,] 0 0 > [42,] 0 0 > [43,] 0 0 > [44,] 0 0 > [45,] 0 0 > [46,] 0 0 > [47,] 0 0 > [48,] 0 0 > [49,] 0 0 > [50,] 0 0 > [51,] 0 0 > [52,] 0 0 > [53,] 0 0 > [54,] 0 0 > [55,] 0 0 > [56,] 0 0 > [57,] 0 0 > [58,] 0 0 > [59,] 0 0 > [60,] 0 0 > > If I increase the number of surrogate variables then, each surrogate > affects one sample each following the diagonal. I find quite strange that > there is a batch effect for just one sample. I plot the MDS of the samples, > and I couldn't observe a batch effect for one of those samples, although if > I change the design (where I block each previous column in a single > variable) then I get other (more typical) surrogate variables . > > Is there something on the design or the way I call sva that would explain > this? Or it is simply correct and am I over-thinking it? I think it might > be related to sva: No significant surrogate variables > <https: support.bioconductor.org="" p="" 51754=""/> > ------------------------------ > > Post tags: sva > > You may reply via email or visit Surrogate variables affecting just one sample? >
ADD COMMENT
0
Entering edit mode

v1 is the result of applying voom to the DGE. v1$E  is a matrix with the normalized expression values on the log2 scale, it has the following dimensions: 13366 rows, 60 columns,  with values between -7.6 and 17.9.

Maybe I should  provide a null matrix, when I did so with `sva(v1$E, design, rep(1, 60), n.sv = 2)` the surrogate variables look normal:

            [,1]         [,2]
[1,]  0.08183073 -0.136044169
[2,]  0.08600953 -0.005406838
[3,] -0.23474225 -0.048616698
[4,]  0.08470510  0.024505520
[5,] -0.22132272 -0.052065565
[6,]  0.08502247  0.247920437
ADD REPLY
0
Entering edit mode
Jeff Leek ▴ 650
@jeff-leek-5015
Last seen 3.9 years ago
United States

Now I see what was going on. In the original design matrix you don't have an intercept term (which would be a column of all ones). This is something you should definitely consider including. If sva isn't given a null model it takes the first column (which is assumed to be an intercept). In this case that wasn't there so you got strange results. 

I would use the alternative/null design matrices and consider adding an intercept term to both. 

Jeff

ADD COMMENT
0
Entering edit mode

Could you please point me where it is documented (I am sorry I missed it) ?

I had to change my design matrix to include the intercept, which makes harder to perform some comparisons. Despite using intercept it didn't improve the distribution of adjusted p-values when using the surrogate variables in the new design. ( ie using f.pvalue didn't make an uniform histogram of p-values)

ADD REPLY

Login before adding your answer.

Traffic: 483 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6