Question

DESeq2 - transcript length correction necessary when quantifying reads from ends of transcripts?

0

Entering edit mode

marius.wenzel • 0

@84e705e7

Last seen 2.7 years ago

United Kingdom

Hi all,

I'm dealing with an unusual case of DGE, where I'm interested in quantifying reads that come from particular regions at transcript ends instead of the whole transcript. I'm screening standard Illumina RNA-Seq reads for these regions and quantify these separately from all other reads (background reads). I'm then doing a gene-based DGE analysis, but am wondering whether this analysis might be affected by differences in isoform usage between treatment groups.

What if one treatment group uses longer isoforms than the other group? In standard DGE, transcript length matters because longer transcripts produce more reads, but I can't quite get my head around whether transcript length affects the number of reads I get at transcript ends. My intuition is that the number of reads at a transcript end depends only on the length of the nucleotide sequence I'm screening for (say, 30bp); if one treatment group now switches to longer isoforms I will see more background reads, but the same number of reads at the first 30bp. This means I don’t need length correction for my counts. Is this correct or should I be doing transcript-level expression analysis accounting for transcript length?

Thank you very much, Marius

Transcriptomics DESeq2 • 1.5k views

ADD COMMENT • link 2.7 years ago marius.wenzel • 0

score 0 · Answer 1 · 2022-07-25

0

Entering edit mode

ATpoint ★ 4.8k

@atpoint-13662

Last seen 9 hours ago

Germany

You need no correction for end-tagged data, see the section in tximport:

https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#3%E2%80%99_tagged_RNA-seq

If you have 3’ tagged RNA-seq data, then correcting the counts for gene length will induce a bias in your analysis, because the counts do not have length bias. Instead of using the default full-transcript-length pipeline, we recommend to use the original counts, e.g. txi$counts as a counts matrix, e.g. providing to DESeqDataSetFromMatrix or to the edgeR or limma functions without calculating an offset and without using countsFromAbundance.

ADD COMMENT • link 2.7 years ago ATpoint ★ 4.8k

0

Entering edit mode

Yes, if you are just counting in a fixed length window, you don't need length correction for those counts.

You may want to do some QC to make sure that you have good 3' coverage, which can be generated with e.g. RNA-SeQC.

ADD REPLY • link 2.7 years ago Michael Love 43k

0

Entering edit mode

Thank you, Michael and ATpoint. I can see the parallels between 3'-tagged data and what I'm doing, but wasn't sure if they are equivalent. It's great news that my counts are not affected by changes in isoform length. Thank you for your input!

ADD REPLY • link 2.7 years ago marius.wenzel • 0