Question

Can I use (length bias normalization) from packages like EDAseq or cqn for input into VOOM?

1

Entering edit mode

Ahdee ▴ 60

@ahdee-8938

Last seen 5 months ago

United States

Hi recently there has been some publication on the importance of GC and length bias ( Mandelboum et al, 2019, PLOS) . I'm looking into how to do this and came across packages like EDAseq. So it looks pretty straightforward with something like this.

dataOffset <- withinLaneNormalization(data,"gc",
                                      which="full",offset=TRUE)

this provide two slots, one for the normalized counts and the other for the offsets. I'm wondering if I can then use the dataOffset normalize count, say normalized_count as input for TMM normalization follow with voom. Would this work, something like.

y.df <- calcNormFactors( y.normalized_count , method = "TMM" )
voom <- voom(y.ydf )

and then just do the limma DGE?

thanks!

limma voom edaseq edger rnaseq • 2.0k views

ADD COMMENT • link updated 5.4 years ago by Gordon Smyth 52k • written 5.4 years ago by Ahdee ▴ 60

score 3 · Answer 1 · 2019-12-17

3

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen just now

WEHI, Melbourne, Australia

I did a careful examination of the phenomenon reported by Mandelboum et al (PLOS Biology 2019) and I rexamined some of our historical dataset analyses using limma and edgeR in this light. I agree that the length bias phenomenon does exist for some datasets, but it is very unlikely to cause any serious problems if you use a standard limma-voom or edgeR pipeline as described in the current documentation.

The phenomenon manifests as a expression bias for very short genes in some samples. The bias can be in either direction (up or down) for an individual sample but is consistent for all the affected genes in that sample. The latter fact induces an inter-gene correlation. The reason that limma-voom analyses are not much impacted is that:

the effect is too small to be important at the individual gene level and
at the pathway level, the limma functions roast(), fry() and camera() prevent the bias from having any impact because they correct for inter-gene correlation (as Mandelboum et al note in their paper).

Note on the other hand that preranked GSEA is sensitive to the problem and does not provide the protection that roast(), fry() and camera() do.

Regarding your actual question, I am not familiar with the output from withinLaneNormalization so I can't comment, except to say that calcNormFactors and voom expect to get actual counts and not computed quantities. As far as I can see, the EDASeq documentation does not explain what is meant by a "normalized count", but the EDASeq vignette does say that "normalized counts" can be input to edgeR and DESeq. If "normalized counts" are suitable for edgeR, then they will certainly be suitable for limma also (but I have not tried this).

Your question is a bit curious in that Mandelboum et al only finds with length bias to be a problem whereas your code deals only with GC bias and ignores length, so there is a disconnect between the results of the cited paper and your code.

ADD COMMENT • link 5.4 years ago Gordon Smyth 52k

0

Entering edit mode

Hey Gordon thanks for your looking over this. One of my main concern is indeed because I use a preranked GSEA approach but in respect to your comment I will take a look at the limma equivalent.

Sorry for the confusion, I pulled out the example directly from the vignette, what I would probably do ( if I do decide to do it ) is normalized for both 'gc' and 'length' bias. As far as input to EdgeR my reading was that it can only be used as an offset so I was not sure if it can be inputed into calcNormFactors for TMM.

ADD REPLY • link 5.4 years ago Ahdee ▴ 60

0

Entering edit mode

Offsets can't be input to calcNormFactors but they would make almost no difference to TMM anyway so IMO ignoring them at the TMM step should be ok.

ADD REPLY • link 5.4 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks again @Gordon. Does the limma romer function also protect against length bias? I tried reading the help and I don't understand what the correlation parameter is for?

Or In this case should I just stick with camera? thanks.

ADD REPLY • link 5.4 years ago Ahdee ▴ 60

0

Entering edit mode

Yes, romer accounts for inter-gene correlation but I use camera myself. camera will give strongest protection if you set inter.gene.cor=NA but is then quite conservative.

ADD REPLY • link 5.4 years ago Gordon Smyth 52k

0

Entering edit mode

thanks again. I like camera as well since it was able to take the weighted precision from VOOM!

ADD REPLY • link 5.4 years ago Ahdee ▴ 60

0

Entering edit mode

Just in case someone is interested. Here is a link to the primary camera article by Di Wu and Gordon Smyth: https://academic.oup.com/nar/article/40/17/e133/2411151

and a good summary of different pathway tools from limma: http://web.mit.edu/~r/current/arch/i386_linux26/lib/R/library/limma/html/10GeneSetTests.html

ADD REPLY • link 5.4 years ago Ahdee ▴ 60