Hello,
I am using Rsubread's featureCounts() to quantify the genes in my RNA-Seq data. There are two options to count "genes" or "exons" that are then aggregated to gene counts when counting "meta-features" instead of features. Why are summing exon counts preferred to automatically counting genes? I am having troubel finding this answer.
Thank You, Sara
Yes, I was confused weeks ago on several aspects of quantification and now am reviewing my work and was interested in exactly why it is better to quantify at "exon-level". And you are right, I had looked into this and had misinterpreted. Thank you for this detailed answer, now I understand and it is clear as to why I was having trouble to finding the answer to my faulty question.
If you open up the GTF file in an editor you will see why the featureCounts options are as they are, even though they might not seem intuitive at first glance. featureCounts() needs to know which rows of the GTF to use. For each gene, the GTF has a row called "gene", which specifies the entire gene range from TSS to TES, and one or more rows labelled "exon". For almost all purposes the "exon" rows are what are important, but for completeness featureCounts gives the option of using the "gene" rows instead by setting the
GTF.featureType
option. The names "gene" and "exon" refer here to row names of the GTF file, not to whether featureCounts is returning gene or exon-level counts.