Non-integer values in salmon counts change DE analyses
1
0
Entering edit mode
mlosada323 • 0
@mlosada323-23593
Last seen 4.3 years ago

Hi all,

Salmon produces transcript counts and some include decimals. I noticed that if I import my quant files from salmon (244 files) with non-integer values and then run DESeq I detected 327 DE genes.

txi <- tximport(files, type = "salmon", tx2gene = txdf, ignoreTxVersion = T)
ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ RSV_NPA)
ddsTxi <- DESeq(ddsTxi)
res <- results(ddsTxi)
summary(res)

out of 18815 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 195, 1%
LFC < 0 (down)     : 132, 0.7%
outliers [1]       : 0, 0%
low counts [2]     : 5868, 31%
(mean count < 7)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

But if I export the unnormalized counts of the just imported file (all integer numbers now) and then reload it as matrix and run DESeq I get only 45 DE genes

dds_counts <- as.data.frame(counts(ddsTxi, normalized=FALSE))
write.table(as.data.frame(dds_counts), file="dds_Ens.txt", sep="\t"
counts <-read.table("dds_Ens.txt",h=T,row.names = 1,sep='\t')
ddsTxi <- DESeqDataSetFromMatrix(countData =counts, colData = samples, design =  ~RSV_NPA)

out of 18812 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 36, 0.19%
LFC < 0 (down)     : 9, 0.048%
outliers [1]       : 0, 0%
low counts [2]     : 10611, 56%
(mean count < 46)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Interestingly DESeqDataSetFromMatrix does not allow for non-integer values in the matrix but DESeqDataSetFromTximport does.

Then I tested the tximport and salmon files in the DESeq tutorial below and observed the same issue if you repeat the steps above http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#how-to-get-help-for-deseq2

Hence could you explain this issue and suggest how to proceed?

Best

Marcos

deseq2 • 4.3k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

The issue is not the integers or not. DESeqDataSetFromTximport rounds incoming fractional counts to integer values. So both methods are working with rounded estimated counts.

The difference is that DESeqDataSetFromTximport uses effective transcript lengths as an offset while DESeqDataSetFromMatrix does not. See the citation for tximport for more details on transcript length as an offset.

How to proceed: use tximport for importing transcript level quantification data for gene or transcript level analyses.

ADD COMMENT
0
Entering edit mode

Thanks for the clarification Michael. Hence, how could I share with colleagues a file with my imported salmon counts based on effective transcript lengths? I initially thought that exporting the DEseq counts would do, but that's clearly not the case. I could also share the salmon quant files, but that seems a bit cumbersome given the high number of file to share. Thanks again for your valuable input, it's greatly appreciated

ADD REPLY
1
Entering edit mode

For computational reproducibility, you definitely want to share the full Salmon output directories, either zipped or using tar -czf quants.tgz .... This allows collaborators to later determine what transcript references were used (what source and what release). See our recent paper on computational reproducibility for RNA-seq:

https://doi.org/10.1371/journal.pcbi.1007664

Or you could share the txi or dds objects.

If they don't want to use R and you just want to share the data, and it's ok if the metadata is lost (e.g. what version of Salmon was used, what options were used, what reference transcripts were used, ...), you should write out the three matrices stored in txi (the counts, the lengths, and the abundance) to plain text. Then later someone could recreate the analysis by reading those matrices in to R, creating a list (txi) and providing this to DESeqDataSetFromTximport.

ADD REPLY
0
Entering edit mode

Excellent. Thanks for the reference!!

ADD REPLY

Login before adding your answer.

Traffic: 540 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6