normalization affects drastically limma-voom + sva results
1
0
Entering edit mode
aec ▴ 90
@aec-9409
Last seen 4.6 years ago

Dear all,

Depending on the normalization approach (none, quantile, TMM  or DESeq2) applied to the limma-voom function, the number of surrogate variables found by SVA and number of differentially expressed genes changes a lot. My question is, does SVA replace the normalization step? For example, if the RNA-seq samples have different sequencing depths, is this a technical factor that SVA corrects for?

Thanks,

normalization sva limma-voom • 2.6k views
ADD COMMENT
1
Entering edit mode
@gordon-smyth
Last seen 5 minutes ago
WEHI, Melbourne, Australia

No, sva does not take the place of normalization. It could ameliorate a problems slightly if you failed to normalize properly, but that is hardly the right way to go.

Note also that limma-voom always corrects for sequencing depth, even if you choose normalize="none".

It is not my experience that choosing between "TMM", "DESeq2" or quantile normalization drastically affects results from limma-voom.

ADD COMMENT
0
Entering edit mode

Gordon,

My problem is that I have been trying SVA with different normalization options as follows:

y=DGEList(counts=counts)
A<-rowSums(y$counts)
isexpr<-A>500
y=y[isexpr,keep.lib.size=FALSE]
yy=calcNormFactors(y)
mod <- model.matrix(~group, data=info)
mod0 <- model.matrix(~1, data=info)
#option 1
v=voom(counts=y,design=mod)
n.sv = num.sv(v$E,mod,method="leek")
n.sv
1
#option 2
v=voom(counts=y,design=mod, normalize="quantile")
n.sv = num.sv(v$E,mod,method="leek")
n.sv
1
#option 3
v=voom(counts=yy,design=mod)
n.sv = num.sv(v$E,mod,method="leek")
n.sv
1
#option 4
n.sv = num.sv(yy,mod,method="leek")
n.sv
57
#option 5
n.sv = num.sv(y,mod,method="leek")
n.sv
57
#option 6
n.sv = num.sv(dat,mod,method="leek")
n.sv
57

For option 6 I followed :https://www.bioconductor.org/help/workflows/rnaseqGene/. Why is it that the number of sv depends so much on applying 'voom' or not?

 

ADD REPLY
0
Entering edit mode

It is completely incorrect to input counts to num.sv() as if they were microarray expression values. You have to use voom() or cpm() to convert the counts to a scale on which a microarray-type analysis makes sense.

When you do use voom(), you get the same number of surrogate variables regardless of what normalization method you used, just as I suggested to you would happen.

ADD REPLY
0
Entering edit mode

Ok, then option 5 is discarded. But what about the rest?

option 4 used TMM normalization and option 6 uses DESeq2 normalization, they both give 57 sv.

options 1,2,3 use voom and give 1 sv.

option 6 follows this code https://www.bioconductor.org/help/workflows/rnaseqGene/ :

dat  <- counts(dds, normalized = TRUE)
idx  <- rowMeans(dat) > 1
dat  <- dat[idx, ]
mod  <- model.matrix(~ dex, colData(dds))
mod0 <- model.matrix(~   1, colData(dds))
svseq <- svaseq(dat, mod, mod0, n.sv = 2)

 

ADD REPLY
0
Entering edit mode

Option 4 is counts. Option 5 is counts. Option 6 is counts. They are all incorrect. Option 6 is not the same as the rnaseq workflow.

If you use voom, you get the correct result ( n.sv=1 ).

If you ignore voom and just use counts, you get the wrong result ( n.sv=57 ).

That's pretty clear, is it not?

ADD REPLY
0
Entering edit mode

You are right Gordon, if I add this step:

logcpm <- cpm(yy, prior.count=2, log=TRUE)
n.sv = num.sv(logcpm,mod,method="leek")
n.sv
[1] 1

I got the same results as 'voom'. 

If I want to calculate n.sv using the rnaseq DESeq2 worklow, how to do it? it is not specified. Should I use the 'rlog' or 'vst' transformations?

 

ADD REPLY

Login before adding your answer.

Traffic: 352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6