Question

Can tximport be used to mitigate sample specific length biases?

0

Entering edit mode

abf ▴ 30

@abf-14661

Last seen 2.7 years ago

United States

A recent publication in PLoS Biology documents sample specific biases in differential expression analyses related to gene length:

Recurrent functional misinterpretation of RNA-seq data caused by sample-specific gene length bias

By using a transcript-aware quantitation tool such as salmon, stringtie, kallisto, or RSEM, and calculating offsets with tximport, could this issue be mitigated?

RNASeq tximport • 768 views

ADD COMMENT • link updated 5.4 years ago by Michael Love 43k • written 5.4 years ago by abf ▴ 30

score 3 · Accepted Answer · 2019-11-22

Thanks for posting. I think the sample-specific biases shown in the paper could be addressed with tximport in its effective length offset, if the upstream method can capture the bias with one of the sample-specific terms it estimates.

I'm familiar with Salmon which has a fragment length distribution (FLD) term by default and an optional position bias term that can be estimated per sample (--posBias). The positional bias model is flexible across short and long transcripts by binning transcripts by their length as was suggested by Roberts (2011). I believe that these two terms should capture the effects seen in the downstream gene counts and gene lengths in this paper. I believe RSEM also has an optional sample-specific positional bias term. Most methods have a sample-specific FLD term.

You could try it out, and then run CQN or EDASeq on the estimated counts you get with tximport and countsFromAbundance="lengthScaledTPM" to see if the biases are effectively removed.

If you see a residual bias, you can always use the offset from CQN or EDASeq as well. I suppose if you're trying for both methods to eliminate the bias you should provide the lengthScaledTPM to the CQN / EDASeq methods, so they do not over-adjust biases which are already corrected by the effective length correction that tximport calculates.