Question

combining multiple salmon runs after tximport

0

Entering edit mode

Ivana • 0

@d6ea7234

Last seen 2.1 years ago

Canada

I have completed multiple RNAseq experiments using a targeted approach (custom Illumina panel looking at only genes of interest). I followed each run by salmon, then tximport. I used a targeted tx2gene file containing only the genes of interest.

txi <- tximport(files, type = "salmon", tx2gene = t2gene, countsFromAbundance = "lengthScaledTPM")

I opted for "lengthScaledTPM" in order to address the transcript length and library size. My first run only included 12 samples and my last included 48 which further encouraged me to use the "lengthScaledTPM" argument. Note that I continue to receive samples for this study (the second phase will begin soon) and incoming samples will come in different batches and I will be running samples as I accumulate n=12 (if many come at once I will be able to run as many as 48).

Some on the forum have stated they avoid using the transcript length correction because they do not feel confident about the transcript-specific values (I believe they mean the assignment of a particular read to one transcript for a gene versus another for that same gene). It is possible I misunderstood the comment.

So my main question is:

is it appropriate to select the lengthscaledtpm option for the targeted custom panel sequencing approach and then combine the txi$counts for downstream comparison of the samples from different runs? I care about within-sample gene differences and between-sample gene differences.

My other question is regarding the comment posted on the forum regarding the lack of confidence of transcript level values. If geneA has 5 transcripts and sample1 gives me a zero for transcript3 and sample3 gives me a zero for transcript4 - how sure should I be about that based on the quasi nature of salmon? Would this potential lack of certainty be the reason to use scaledTPM and only address library size instead of using lengthscaledTPM?

Thank you in advance for any help on this :)

tximport RNASeq salmon • 1.5k views

ADD COMMENT • link updated 2.1 years ago by ATpoint ★ 4.8k • written 2.1 years ago by Ivana • 0

score 0 · Answer 1 · 2023-02-17

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 19 hours ago

Germany

The basic rule is simple: If there is length bias then indeed correct for it, else don't. You have to check the kit specifications on how it captures the transcripts. If for example it only captures 3' ends then we expect no length bias, see the vignette for end-tagged data. If the kit captures full-length transcripts then it is reasonable to correct for it and lengthScaledTPM makes sense.