Question

Why is it better to count genes at the exon-level using subread::featureCounts()?

0

Entering edit mode

Sara • 0

@41d09ed8

Last seen 15 months ago

United States

Hello,

I am using Rsubread's featureCounts() to quantify the genes in my RNA-Seq data. There are two options to count "genes" or "exons" that are then aggregated to gene counts when counting "meta-features" instead of features. Why are summing exon counts preferred to automatically counting genes? I am having troubel finding this answer.

Thank You, Sara

RNASeq Rsubread featureCounts Quantification • 6.0k views

ADD COMMENT • link updated 2.3 years ago by Gordon Smyth 52k • written 2.3 years ago by Sara • 0

score 2 · Answer 1 · 2023-01-11

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 7 hours ago

WEHI, Melbourne, Australia

I feel that I answered your question previously: Quantification of Genes with RSubread::featureCounts() at exon-level vs gene-level? but I will try again. You are still misinterpretting how the read counting works and I hope that a few more words will perhaps clarify things for you. You don't say which code options you are considering but I will assume they are the same as in your previous question.

featureCounts does not count reads at the exon-level and then add them up to get gene-level counts. The options you are refering to are instead between

Counting whole gene bodies (from TSS to TES) including both exons and introns
Or only counting reads that arise from the expressed part of each gene.

We recommend that latter approach because counting reads that are mapped entirely to introns tends to increase noise relative to the second approach. Neither of these approaches is equivalent to exon-level counting.

ADD COMMENT • link 2.3 years ago Gordon Smyth 52k

0

Entering edit mode

Yes, I was confused weeks ago on several aspects of quantification and now am reviewing my work and was interested in exactly why it is better to quantify at "exon-level". And you are right, I had looked into this and had misinterpreted. Thank you for this detailed answer, now I understand and it is clear as to why I was having trouble to finding the answer to my faulty question.

ADD REPLY • link 2.3 years ago Sara • 0

1

Entering edit mode

If you open up the GTF file in an editor you will see why the featureCounts options are as they are, even though they might not seem intuitive at first glance. featureCounts() needs to know which rows of the GTF to use. For each gene, the GTF has a row called "gene", which specifies the entire gene range from TSS to TES, and one or more rows labelled "exon". For almost all purposes the "exon" rows are what are important, but for completeness featureCounts gives the option of using the "gene" rows instead by setting the GTF.featureType option. The names "gene" and "exon" refer here to row names of the GTF file, not to whether featureCounts is returning gene or exon-level counts.

ADD REPLY • link 2.3 years ago Gordon Smyth 52k