Hi all,
I'm dealing with an unusual case of DGE, where I'm interested in quantifying reads that come from particular regions at transcript ends instead of the whole transcript. I'm screening standard Illumina RNA-Seq reads for these regions and quantify these separately from all other reads (background reads). I'm then doing a gene-based DGE analysis, but am wondering whether this analysis might be affected by differences in isoform usage between treatment groups.
What if one treatment group uses longer isoforms than the other group? In standard DGE, transcript length matters because longer transcripts produce more reads, but I can't quite get my head around whether transcript length affects the number of reads I get at transcript ends. My intuition is that the number of reads at a transcript end depends only on the length of the nucleotide sequence I'm screening for (say, 30bp); if one treatment group now switches to longer isoforms I will see more background reads, but the same number of reads at the first 30bp. This means I don’t need length correction for my counts. Is this correct or should I be doing transcript-level expression analysis accounting for transcript length?
Thank you very much, Marius
Yes, if you are just counting in a fixed length window, you don't need length correction for those counts.
You may want to do some QC to make sure that you have good 3' coverage, which can be generated with e.g. RNA-SeQC.
Thank you, Michael and ATpoint. I can see the parallels between 3'-tagged data and what I'm doing, but wasn't sure if they are equivalent. It's great news that my counts are not affected by changes in isoform length. Thank you for your input!