Hi, I am using tximport to assemble transcript level expression data from Salmon into gene-level expression data. I have read through the documentation but I am still unsure on how to interpret the "counts" and "abundance" matrix. As far as I understood:
- Counts = best estimate of the original counts
- Abundance = TPMs (at least when using Salmon input data)
I have gathered the data in two different ways:
# With countsFromAbundance as a default setting
txi.salmon <- tximport(files, type = "salmon", txIn=T, tx2gene = tx2gene, ignoreTxVersion = T)
# With countsFromAbundance = scaledTPM
txi.scaled_tpm <- tximport(files, type = "salmon", txIn=T, tx2gene = tx2gene, ignoreTxVersion = T,countsFromAbundance="scaledTPM")
# Comparing the counts matrix
sum(!txi.scaled_tpm$counts==txi.salmon$counts)
# [1] 1079547
# Comparing the abundance matrix
sum(!txi.scaled_tpm$abundance==txi.salmon$abundance)
#[1] 0
- Why do the counts matrixes differ?
- Are the counts an estimate of the original counts and the abundance the TPMs?
Thanks for the fast answer. There is still something I don't understand, why does the count matrix differ when I add a different option for countsFromAbundance?
Those are methods that directly modify counts. The default is to use an offset and not modify the counts.
Thanks a lot, now I get it.