Question

salmon output for DESeq2 analysis

0

Entering edit mode

capricygcapricyg ▴ 10

@capricygcapricyg-17892

Last seen 2.4 years ago

United States

HI, Michael,

I read your DESeq2 vignette: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html

and found that DESeqDataSet could be derived from the salmon output (Transcript abudance) or count matrix.

I wonder if you ever compare the results of these two process (salmon->tximport->DESeq2 versus counts->DESeq2) for differential gene call from the same sequencing dataset?

Thanks.

C

DESeq2 • 32k views

ADD COMMENT • link 5.9 years ago capricygcapricyg ▴ 10

score 0 · Answer 1 · 2018-12-17

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

Yes such comparisons were made in the tximport publication.

ADD COMMENT • link 5.9 years ago Michael Love 43k

1

Entering edit mode

Michael,

Thank you very much for your quick response!

To make sure my understanding is correct, I found the following paper:

https://www.ncbi.nlm.nih.gov/pubmed/26925227

And you conclusion is: "salmon->tximport->DESeq2" is better than "counts->DESeq2"?

Kind regards,

C.

ADD REPLY • link 5.9 years ago capricygcapricyg ▴ 10

0

Entering edit mode

Yes the advantages are that it protects against estimation bias from DTU, enables certain fragment level biases to be estimated and preserves multimapping reads.

ADD REPLY • link 5.9 years ago Michael Love 43k

0

Entering edit mode

I have different concerns, actually:

counts data usually come from genome alignment; however, salmon data from the transcriptome alignment. I found tximport converted counts were not really matching the genome alightment-based counts...

ADD REPLY • link 5.9 years ago capricygcapricyg ▴ 10

0

Entering edit mode

What would be the point of tximport if you got the same thing as the genome-based alignment? Put another way, both alignment to the genome with subsequent counting and alignment to the transcriptome and then collapsing to the gene level are attempts to get at the same thing - the relative amount of transcript in a given sample for each gene. But we don't know how much transcript there is!

The fact that two different methods of estimating some underlying (unobserved) quantity don't necessarily agree doesn't invalidate either of them, because we don't know what the base truth is. If you want to believe that aligning to the genome and then generating counts is 'the right way to do things', then you should do that. If you are persuaded by Mike's paper that you get better results aligning to the transcriptome and then summarizing using tximport, then you should do that instead. But comparing the two and noting they are different doesn't tell you anything because the only reason for having a different method is because it's different than what came before.

ADD REPLY • link 5.9 years ago James W. MacDonald 67k

0

Entering edit mode

Hi, James,

As you mentioned that we don't know what the base truth is, whenever the outputs are different, I just would like to know if anyone has ever tested which one makes more sense...

C.

ADD REPLY • link 5.9 years ago capricygcapricyg ▴ 10

0

Entering edit mode

Yes. Mike did, in the tximport paper that he already mentioned. Have you read it?