Entering edit mode
HI, Michael,
I read your DESeq2 vignette: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html
and found that DESeqDataSet could be derived from the salmon output (Transcript abudance) or count matrix.
I wonder if you ever compare the results of these two process (salmon->tximport->DESeq2 versus counts->DESeq2) for differential gene call from the same sequencing dataset?
Thanks.
C
Michael,
Thank you very much for your quick response!
To make sure my understanding is correct, I found the following paper:
https://www.ncbi.nlm.nih.gov/pubmed/26925227
And you conclusion is: "salmon->tximport->DESeq2" is better than "counts->DESeq2"?
Kind regards,
C.
Yes the advantages are that it protects against estimation bias from DTU, enables certain fragment level biases to be estimated and preserves multimapping reads.
I have different concerns, actually:
counts data usually come from genome alignment; however, salmon data from the transcriptome alignment. I found tximport converted counts were not really matching the genome alightment-based counts...
What would be the point of tximport if you got the same thing as the genome-based alignment? Put another way, both alignment to the genome with subsequent counting and alignment to the transcriptome and then collapsing to the gene level are attempts to get at the same thing - the relative amount of transcript in a given sample for each gene. But we don't know how much transcript there is!
The fact that two different methods of estimating some underlying (unobserved) quantity don't necessarily agree doesn't invalidate either of them, because we don't know what the base truth is. If you want to believe that aligning to the genome and then generating counts is 'the right way to do things', then you should do that. If you are persuaded by Mike's paper that you get better results aligning to the transcriptome and then summarizing using tximport, then you should do that instead. But comparing the two and noting they are different doesn't tell you anything because the only reason for having a different method is because it's different than what came before.
Hi, James,
As you mentioned that we don't know what the base truth is, whenever the outputs are different, I just would like to know if anyone has ever tested which one makes more sense...
C.
Yes. Mike did, in the tximport paper that he already mentioned. Have you read it?
Good points. Just want to point out that Charlotte Soneson is the first author.