Hi,
I am using Salmon to quantify reads in transcripts and aggregating them with tximport. I load the resulting object into a DESeq object with DESeqDataSetFromTximport
and proceed as described in the vignette. Normally I like to present normalized expression data of certain genes as CPM/FPM, which can conveniently achieved with the fpm
function. However, when data is obtained via tximport, average transcript lengths are present in the DESeq object and the fpm
function does not apply any normalization.
Now, I am considering two ways to deal with this, but I am not sure what is more appropriate:
1) Calculate FPM on the normalized counts:
k <- counts(object,normalized=T)
library.sizes <- colSums(k)
1e+06 * sweep(k, 2, library.sizes, "/")
2) Estimate the sizeFactors of the DESeq object and proceed as usual:
> k <- counts(object,normalized=F)
> sf <- estimateSizeFactorsForMatrix(counts(object) )
> library.sizes <- sf * exp(mean(log(colSums(k))))
> 1e+06 * sweep(k, 2, library.sizes, "/")
3) Same as above, but dividing by average transcript length
sf <- estimateSizeFactorsForMatrix(counts(object) ) / assays(object)[["avgTxLength"]]
What would be more correct in this case? Is there a superior alternative?