SVA package - ERROR: nvobj = sva(edata, mod, mod0, n.sv=n.sv)
1
1
Entering edit mode
aina.jene ▴ 10
@ainajene-12984
Last seen 7.4 years ago
ISGlobal - Barcelona

Hello,

I am currently working on an RNA-seq ExpressionSet from TCGA and I want to proceed with a linear model analysis using voom, but first I need to remove the batch effect. In order to do so, I am using sva package.

My data is:

In this expression set, the dimensions are:

  • pheno: 164         3
  • edata: 20531    164

In pheno we have all samples in rows and each column represents: Call, Status and Id

  • Call: the individual can present “Loss of Chromosome Y”, “Normal” or “XYY”.
  • Status: stands for the origin of each sample “tumor” or “normal”.
  • Id: the id of each sample

Knowing that:

Full model is:

  • mod = model.matrix(~ Call + Status, data = pheno) [both, Call and Status are class factor]

Null model is just like the example in the vignette:

  • mod0 = model.matrix(~ 1, data = pheno)

 

Then, when I try to find the number of surrogate variables (ns <- num.sv(counts, mod, method="leek"), it turns out that there are 160 variable. If I change the method to "be", then the number of surrogate variables decrease to 1.

Finally, when I try the next function: 

  • svobj <- sva(counts, mod, mod0, n.sv=ns)

I ALWAYS get the following error message:

 

" Number of significant surrogate variables is: 1

Iteration (out of 5): Error in density.default(x, adjust = adj): 'x' contains missing values "

 

I have already checked the data and there are no missing values at all, and I have also tried to remove those rows with values equal to 0 in order to reduce the dataset but it did not work either.

Could anybody help me to try and find a solution?

 

Thank you all beforehand,

 

Aina

 

sva removebatcheffect R • 2.3k views
ADD COMMENT
1
Entering edit mode
sina.nassiri ▴ 130
@sinanassiri-10062
Last seen 3.3 years ago
Switzerland/Lausanne

Aina,

  • You mentioned that you want to proceed with a linear model analysis using "voom" and also referred to your input matrix as "counts", so I'm assuming you have count data in hand. If that's the case, you would want to use svaseq() instead of  sva().
  • In regard to formulating the full model, I just want to add that you need to be careful that you don't leave any potentially important biological factor out of equation. For example, in your full model you're assuming that "Call" and "Status" have a linear additive effect on gene expression. I don't know enough about your data, but could there be interaction between "Call" and "Status"? Keep in mind that any effects that are not explicitly accounted for in the full model may be removed by sva/svaseq as unwanted effects.

mod1 = model.matrix(~ Call + Status, data = pheno)

or

mod2 = model.matrix(~ Call + Status + Call:Status, data = pheno)

  • In calculating number of surrogate variables, "be" is the recommended default method based on permutation. "leek" is an alternative asymptotic approach, which may perform better only when dealing with large samples.
  • Finally, regarding the error message, it's hard to tell without having access to your code and input file.
  • One last point since you mentioned your data is from TCGA, are you familiar with MBatch? MBatch provides visualization tools to assess batch effects in TCGA data. You can also obtain data already adjusted for batch effects using different methods including ComBat from the sva package. It might be worth exploring.

Hope these help!

ADD COMMENT
0
Entering edit mode

Dear Sina Nassiri,

Thanks a lot for your help. I will take a look to all your points. It was very helpful!

Aina

ADD REPLY

Login before adding your answer.

Traffic: 1018 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6