Hello,
I am following along with an RNA-Seq Pipeline with ambiguous steps. I was told they quantification was performed like so: "We used subread to summarize counts to the gene level."
These are the parameters that I am using:
mock1 <- featureCounts(files="mock1_Aligned.sortedByCoord.out.bam",isPairedEnd=TRUE, GTF.featureType="exon",
GTF.attrType="gene_id", annot.ext= "gencode.v41.primary_assembly.annotation.gtf", isGTFAnnotationFile=TRUE,
countMultiMappingReads = FALSE, countReadPairs=T, minMQS = 30, largestOverlap = T)
So my reads are being summarized at the gene-level correct because the default is to count meta-features?
I am confused because the GTF.featureType is set to "exon" and not gene? I have performed Differential expression with GTF.featureType set to both exon and gene and get differences in my top-DE genes. I would expect the results to be similar if exon counts are being aggregated at the gene-level... I feel like keeping the largest Overlap could cause differences because in one situation it is the overlap for genes and it is the overlap for exons in another. How should I deal with overlapping fragments for the most accurate results?
Here is the status report for gene feature quantification:
Status mock
Assigned 15877654
Unassigned_Unmapped 73946
Unassigned_Read_Type 0
Unassigned_Singleton 0
Unassigned_MappingQuality 3571086
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 10826366
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 963219
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 3536273
And for exon-feature:
Status mock1
Assigned 15795944
Unassigned_Unmapped 73946
Unassigned_Read_Type 0
Unassigned_Singleton 0
Unassigned_MappingQuality 3571086
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 10826366
Unassigned_Secondary 0
Unassigned_NonSplit 0
Unassigned_NoFeatures 3532032
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 1049170
Counts for counting exons:
gene mock1
ENSG00000223972.5 0
ENSG00000227232.5 28
ENSG00000278267.1 0
ENSG00000243485.5 22
Counts for counting genes:
mock1
ENSG00000223972.5 0
ENSG00000227232.5 86
ENSG00000278267.1 0
ENSG00000243485.5 19
Why are the counts significantly higher for ENSG00000227232.5 when the GTF.featureType is set to "gene". There are more exons than genes so I figured the count would be higher for exon?
Thanks, Sara