Question

Correct use of tximport in combination with edgeR cpm()

1

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 1 day ago

Germany

I imported a set of salmon quantifications into R with tximport default settings and exactly used the code on the manual page for tximport to prepare data for use with edgeR. The result is a DGElist with the offsets for the downstream DGE analysis.

Issue: The DGElist (y$samples) does not contain the lib.size factors (they are all 1) for obtaining TMM-normalized counts via cpm(y, log=F).

Therefore, the question is how to feed normalization factors into y$samples$norm.factors while still using the information from tximport. One can of course run calcNormFactors(y) manually but then the length offsets from tximport are lost. Is there a recommended approach?

Edit 01/2020: For those interested, the code suggestions have now been integrated into the tximport manual, see: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#edger

Thanks to Aaron and Mike for the support!

tximport edgeR • 2.8k views

ADD COMMENT • link 6.0 years ago • updated 5.3 years ago ATpoint ★ 4.8k

0

Entering edit mode

This is beyond my knowledge of edgeR. I checked that chunk of tximport vignette code with Aaron at some point, to make sure we were doing it properly.

ADD REPLY • link 6.0 years ago Michael Love 43k

0

Entering edit mode

Oops. Spotted something. Will open an issue.

Edit: actually, ignore that, it's fine - phew.

ADD REPLY • link 6.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Maybe add a short comment to the tximport vignette referencing the suggestions from Aaron below. Using the corrected counts for things like clustering etc. is standard so I was actually surprised no one asked this before (by best knowledge, maybe I missed the respective threads).

ADD REPLY • link 6.0 years ago ATpoint ★ 4.8k

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 14 hours ago

United States

If you have an offsets matrix in your DGEList then you won't use the norm.factors anyway, so it wouldn't matter if you did something with them or not. Put a different way, the offsets are supposed to be better than simple normalization factors, and are preferentially used by glmFit.

ADD COMMENT • link 6.0 years ago James W. MacDonald 68k

0

Entering edit mode

This I understand but how about obtaining normalized counts for non-GLM applications like clustering or checking normalization efficiency by MA plots. Can you use the offsets for those, too?

ADD REPLY • link 6.0 years ago ATpoint ★ 4.8k

score 3 · Accepted Answer · 2019-05-15

3

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 1 hour ago

The city by the bay

Have a look at csaw::calculateCPM(), which does exactly as you request (see usage here). You'll need to convert it back into a SummarizedExperiment, though, the function doesn't take DGEList objects... or you can use csaw::normFactors() instead of calcNormFactors() to keep everything in a SummarizedExperiment form. (Note the difference in the weighted default, though, as this was built for ChIP-seq data.)

ADD COMMENT • link 6.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks Aaron,

I think that should do it. If you move your comment to answer I can accept it.

For completeness, here is the code I used:

## convert DGElist to SummarizedExperiments given a DGElist "y" from the code in toplevel question
library(csaw)
se <- SummarizedExperiment(assays = y$counts)
names(assays(se))[1] <- "counts"
se$totals <- y$samples$lib.size
assay(se, "offset") <- y$offset
se.cpm <- calculateCPM(se, use.norm.factors = F, use.offsets = T, log = F)

ADD REPLY • link 6.0 years ago ATpoint ★ 4.8k

0

Entering edit mode

hi Aaron and AT,

I'll add this to the tximport vignette if Aaron gives the ok. I'm just less knowledgeable about internals so want to make sure I don't promulgate something not accurate.

ADD REPLY • link 5.9 years ago Michael Love 43k

0

Entering edit mode

Looks fine to me. You needn't use.norm.factors if you have use.offsets=TRUE, the latter overrides the former. The only other comments are to avoid T and F, but I know that Mike would never put those in a vignette anyway.

Mike, if you open a PR on the vignette, I can put in some comments to explain what and why, especially around the offset calculation part. Otherwise I'll have to re-remember everything the next time this pops up.

ADD REPLY • link 5.9 years ago Aaron Lun ★ 28k

1

Entering edit mode

I added the following to the vignette in the devel branch of tximport:

https://github.com/mikelove/tximport/commit/225953efef09f2a925c99242034abfa4d933a0f7

Let me know if that looks ok.

Thanks AT

ADD REPLY • link 5.9 years ago Michael Love 43k

0

Entering edit mode

Thank you both for the outstanding responsiveness to questions and issues. Can you move Aaron's comment to answer so it is the toplevel answer?

ADD REPLY • link 5.9 years ago ATpoint ★ 4.8k