I am trying to better understand how to use tximport
and DESeq2
together. Both packages provide a good summary. From the tximport
vignette:
Note: there are two suggested ways of importing estimates for use with differential gene expression (DGE) methods. The first method, which we show below for edgeR and for DESeq2, is to use the gene-level estimated counts from the quantification tools, and additionally to use the transcript-level abundance estimates to calculate a gene-level offset that corrects for changes to the average transcript length across samples. ... the function DESeqDataSetFromTximport takes care of creation of the offset for you. Let’s call this method “original counts and offset”. The second method is to use the tximport argument countsFromAbundance="lengthScaledTPM" or "scaledTPM", and then to use the gene-level count matrix txi$counts directly as you would a regular count matrix with these software. Let’s call this method “bias corrected counts without an offset”
Looking at the DESeqDataSetFromTximport()
code, it looks like it will properly handle a tximport
object regardless of the countsFromAbundance
setting:
stopifnot(txi$countsFromAbundance %in% c("no","scaledTPM","lengthScaledTPM"))
if (txi$countsFromAbundance %in% c("scaledTPM","lengthScaledTPM")) {
message("using just counts from tximport")
} else {
message("using counts and average transcript lengths from tximport")
lengths <- txi$length
stopifnot(all(lengths > 0))
dimnames(lengths) <- dimnames(object)
assays(object)[["avgTxLength"]] <- lengths
}
Using "lengthScaledTPM" or "scaledTPM" is actually more flexible for DESeq2
since that allows you to use either the count matrix or the tximport
object. Is that correct? Maybe I am misinterpreting, but the note makes it sound like using the offset is the preferred method. It seems using "lengthScaledTPM" or "scaledTPM" like in the "limma-voom" workflow would be simpler for other workflows as well. Is there a downside to that approach?
Thanks for clarifying. I actually didn't realize that the two approaches are not completely identical.