Question

Quantification of Genes with RSubread::featureCounts() at exon-level vs gene-level?

0

Entering edit mode

Sara • 0

@41d09ed8

Last seen 14 months ago

United States

Hello,

I am following along with an RNA-Seq Pipeline with ambiguous steps. I was told they quantification was performed like so: "We used subread to summarize counts to the gene level."

These are the parameters that I am using:

mock1 <- featureCounts(files="mock1_Aligned.sortedByCoord.out.bam",isPairedEnd=TRUE, GTF.featureType="exon",
 GTF.attrType="gene_id", annot.ext= "gencode.v41.primary_assembly.annotation.gtf", isGTFAnnotationFile=TRUE, 
countMultiMappingReads = FALSE, countReadPairs=T, minMQS = 30, largestOverlap = T)

So my reads are being summarized at the gene-level correct because the default is to count meta-features?

I am confused because the GTF.featureType is set to "exon" and not gene? I have performed Differential expression with GTF.featureType set to both exon and gene and get differences in my top-DE genes. I would expect the results to be similar if exon counts are being aggregated at the gene-level... I feel like keeping the largest Overlap could cause differences because in one situation it is the overlap for genes and it is the overlap for exons in another. How should I deal with overlapping fragments for the most accurate results?

Here is the status report for gene feature quantification:

Status  mock
Assigned                  15877654
Unassigned_Unmapped       73946
Unassigned_Read_Type            0
Unassigned_Singleton    0
Unassigned_MappingQuality     3571086
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping      10826366
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures         963219
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity          3536273

And for exon-feature:

Status  mock1
Assigned                15795944
Unassigned_Unmapped     73946
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality    3571086
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 10826366
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures        3532032
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity        1049170

Counts for counting exons:

    gene mock1
ENSG00000223972.5   0
ENSG00000227232.5   28
ENSG00000278267.1   0
ENSG00000243485.5   22

Counts for counting genes:

    mock1
ENSG00000223972.5   0
ENSG00000227232.5   86
ENSG00000278267.1   0
ENSG00000243485.5   19

Why are the counts significantly higher for ENSG00000227232.5 when the GTF.featureType is set to "gene". There are more exons than genes so I figured the count would be higher for exon?

Thanks, Sara

Subread RNASeq featureCounts quantification • 3.3k views

ADD COMMENT • link updated 2.4 years ago by Gordon Smyth 52k • written 2.4 years ago by Sara • 0

score 1 · Answer 1 · 2022-11-25

GTF.featureType should be set to `"exon" regardless of whether you want to count at the exon or gene level.

Apart from specifying your files by

files = "mock1_Aligned.sortedByCoord.out.bam",
isPairedEnd = TRUE, 
annot.ext = "gencode.v41.primary_assembly.annotation.gtf",
isGTFAnnotationFile = TRUE,

you should leave all arguments at their default values. All arguments are set to defaults that are recommended and used by the featureCounts authors themselves, so you should not change the settings unless your situation is unusual (which yours is not) or you are an expert.

Setting GTF.featureType="gene" tells featureCounts to count all reads that map anywhere in the gene body including introns as well as exons, so naturally it will give higher counts.