Question

Surrogate variables affecting just one sample?

0

Entering edit mode

Lluís Revilla Sancho ▴ 760

@lluis-revilla-sancho

Last seen 4 weeks ago

European Union

I have previously used the package with succesfully results previously however, in this case I don't manage to understand the results of sva function.

I calculated the number of surrogate variables with both methods provided in num.sv:

> num.sv(v1$E, design, "leek")
[1] 2
> num.sv(v1$E, design, "be")
[1] 8

I took the lowest number and calculated the surrogate variables, and together with the design matrix I called the sva function:

design <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), .Dim = c(60L, 6L), .Dimnames = list(c("A01",
"A02", "A03", "A04", "A05", "A06", "A07", "A08", "A09", "B10",
"B11", "B12", "B13", "B14", "B15", "B16", "B17", "B18", "C19",
"C20", "C21", "C22", "C23", "C24", "C25", "C26", "C27", "C28",
"C29", "E41", "E42", "E43", "E44", "E45", "E46", "E47", "E48",
"E49", "E50", "E51", "E52", "H72", "H73", "H74", "H75", "H76",
"H77", "H78", "H79", "H80", "I89", "I90", "I91", "I92", "I93",
"I94", "I95", "I96", "I97", "I98"), c("AH", "Non.responders",
"Responders", "C.Comp", "ASH", "Normal")))
sv1 <- sva(v1$E, design, n.sv = 2)
However the surrogate variables are:
# The sv1$sv  is:
      [,1] [,2]
 [1,]    1    0
 [2,]    0    1
 [3,]    0    0
 [4,]    0    0
 [5,]    0    0
 [6,]    0    0
 [7,]    0    0
 [8,]    0    0
 [9,]    0    0
[10,]    0    0
[i,]    0    0

If I increase the number of surrogate variables then, each surrogate affects one sample each following the diagonal. I find quite strange that there is a batch effect for just one sample. I plot the MDS of the samples, and I couldn't observe a batch effect for one of those samples, although if I change the design (where I block each previous column in a single variable) then I get other (more typical) surrogate variables .

Is there something on the design or the way I call sva that would explain this? Or it is simply correct and am I over-thinking it? I think it might be related to sva: No significant surrogate variables

sva • 1.2k views

ADD COMMENT • link updated 8.0 years ago by Jeff Leek ▴ 650 • written 8.0 years ago by Lluís Revilla Sancho ▴ 760

score 0 · Answer 1 · 2017-01-09

Something very strange is going on. What are the dimensions and type of data in V1$E? On Mon, Jan 9, 2017 at 11:02 AM Lluís R [bioc] <noreply@bioconductor.org> wrote: > Activity on a post you are following on support.bioconductor.org > > User Lluís R <https: support.bioconductor.org="" u="" 7250=""/> wrote Question: > Surrogate variables affecting just one sample? > <https: support.bioconductor.org="" p="" 90969=""/>: > > I have previously used the package with succesfully results previously > however, in this case I don't manage to understand the results of sva > function. > > I calculated the number of surrogate variables with both methods provided > in num.sv: > > > num.sv(v1$E, design, "leek") > [1] 2 > > num.sv(v1$E, design, "be") > [1] 8 > > I took the lowest number and calculated the surrogate variables, and > together with the design matrix I called the sva function: > > design <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, > 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, > 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, > 1, 1, 1, 1, 1, 1, 1), .Dim = c(60L, 6L), .Dimnames = list(c("A01", > "A02", "A03", "A04", "A05", "A06", "A07", "A08", "A09", "B10", > "B11", "B12", "B13", "B14", "B15", "B16", "B17", "B18", "C19", > "C20", "C21", "C22", "C23", "C24", "C25", "C26", "C27", "C28", > "C29", "E41", "E42", "E43", "E44", "E45", "E46", "E47", "E48", > "E49", "E50", "E51", "E52", "H72", "H73", "H74", "H75", "H76", > "H77", "H78", "H79", "H80", "I89", "I90", "I91", "I92", "I93", > "I94", "I95", "I96", "I97", "I98"), c("AH", "Non.responders", > "Responders", "C.Comp", "ASH", "Normal"))) > sv1 <- sva(v1$E, design, n.sv = 2) > However the surrogate variables are: > # The sv1$sv is: > [,1] [,2] > [1,] 1 0 > [2,] 0 1 > [3,] 0 0 > [4,] 0 0 > [5,] 0 0 > [6,] 0 0 > [7,] 0 0 > [8,] 0 0 > [9,] 0 0 > [10,] 0 0 > [11,] 0 0 > [12,] 0 0 > [13,] 0 0 > [14,] 0 0 > [15,] 0 0 > [16,] 0 0 > [17,] 0 0 > [18,] 0 0 > [19,] 0 0 > [20,] 0 0 > [21,] 0 0 > [22,] 0 0 > [23,] 0 0 > [24,] 0 0 > [25,] 0 0 > [26,] 0 0 > [27,] 0 0 > [28,] 0 0 > [29,] 0 0 > [30,] 0 0 > [31,] 0 0 > [32,] 0 0 > [33,] 0 0 > [34,] 0 0 > [35,] 0 0 > [36,] 0 0 > [37,] 0 0 > [38,] 0 0 > [39,] 0 0 > [40,] 0 0 > [41,] 0 0 > [42,] 0 0 > [43,] 0 0 > [44,] 0 0 > [45,] 0 0 > [46,] 0 0 > [47,] 0 0 > [48,] 0 0 > [49,] 0 0 > [50,] 0 0 > [51,] 0 0 > [52,] 0 0 > [53,] 0 0 > [54,] 0 0 > [55,] 0 0 > [56,] 0 0 > [57,] 0 0 > [58,] 0 0 > [59,] 0 0 > [60,] 0 0 > > If I increase the number of surrogate variables then, each surrogate > affects one sample each following the diagonal. I find quite strange that > there is a batch effect for just one sample. I plot the MDS of the samples, > and I couldn't observe a batch effect for one of those samples, although if > I change the design (where I block each previous column in a single > variable) then I get other (more typical) surrogate variables . > > Is there something on the design or the way I call sva that would explain > this? Or it is simply correct and am I over-thinking it? I think it might > be related to sva: No significant surrogate variables > <https: support.bioconductor.org="" p="" 51754=""/> > ------------------------------ > > Post tags: sva > > You may reply via email or visit Surrogate variables affecting just one sample? >

score 0 · Answer 2 · 2017-01-10

0

Entering edit mode

Jeff Leek ▴ 650

@jeff-leek-5015

Last seen 3.9 years ago

United States

Now I see what was going on. In the original design matrix you don't have an intercept term (which would be a column of all ones). This is something you should definitely consider including. If sva isn't given a null model it takes the first column (which is assumed to be an intercept). In this case that wasn't there so you got strange results.

I would use the alternative/null design matrices and consider adding an intercept term to both.

Jeff

ADD COMMENT • link 8.0 years ago Jeff Leek ▴ 650

0

Entering edit mode

Could you please point me where it is documented (I am sorry I missed it) ?

I had to change my design matrix to include the intercept, which makes harder to perform some comparisons. Despite using intercept it didn't improve the distribution of adjusted p-values when using the surrogate variables in the new design. ( ie using f.pvalue didn't make an uniform histogram of p-values)

ADD REPLY • link 8.0 years ago Lluís Revilla Sancho ▴ 760