Featurecounts issue for counting exons: Not matching total counts mapped for specific gene
1
0
Entering edit mode
A ▴ 40
@a-14337
Last seen 14 months ago
United Kingdom

Hi all, 

I am having a weird issue (weird to me being a new R user) using featurecounts to generate counts for genes but also for individual exons. I have successfully mapped gene counts to their respective genes and the total reads mapped and annotated to genes match those generated by a bioinformatics department so its a good quality control. 

I also want to generate counts for individual exons so that I can test the knockdown of one of our genes at specific exons in a mouse model. The issue is that the exon counts do not add up or match I am seeing with total read mapped to genes. The code I am using is as follows:

For annotating genes:

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE)

Count result for gene of interest = 

ENSMUSG00000053080 = 1158

 

This matches results obtained from bioinformatcis department  for the gene above as well as all other genes (with +/- of 5 counts, I am guessing for parameters they have used in their script which I am not )...

However, when I attempt to annotate exons specifically, I am running in to trouble for two reasons. The total counts generated and also the amount of exons being incorrect (possible duplicates?)

The code I am using is as follows:

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE, useMetaFeatures = FALSE) OR

fc <- featureCounts(bamDir, isGTFAnnotationFile = TRUE, annot.ext = "Mus_musculus.GRCm38.94.chr.gtf", isPairedEnd = FALSE, useMetaFeatures = "exons")

These both produce the following results: 

 

ENSMUSG00000053080 3
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 71
ENSMUSG00000053080 55
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 0
ENSMUSG00000053080 8
ENSMUSG00000053080 0
ENSMUSG00000053080

0

 

This is icnredible confusing as there are only 5 exons for this gene, and clearly the counts are far too low to reflect the total counts annotated to the gene as a whole.. Can anybody advice what I might be doing wrong here and why this result is being produced?

 

And finally, is there an intuitive way of plotting counts per exons in an easy to follow way, maybe thorugh ggplo2 just plotting counts on y axis by exon number on x axis? or a way of passing this festurecount result to ggbio?

Thank you for your help!

 

 

featurecounts Rsubread ggbio • 3.5k views
ADD COMMENT
0
Entering edit mode
Wei Shi ★ 3.6k
@wei-shi-2183
Last seen 4 weeks ago
Australia/Melbourne

Yes as you suspect there should be duplicates in the annotated exons for this gene. You can check this by looking at the exon annotation data featureCounts extracted from the GTF file (the extracted annotation is saved in the component fc$annotation).

When you count reads at exon level, by default a read overlapping more than one exon is not counted. Therefore all duplicate exons will receive zero count. Also junction reads will not be counted either because they span more than one exon. However if you set allowMultiOverlap=TRUE you will get all these reads counted, but the total number of reads counted to exons will be greater than the total number of reads counted to the gene.

For gene level counting, reads overlapping duplicate exons or multiple different exons are always counted once only (the allowMultiOverlap parameter controls how to count those reads overlapping more than one gene instead of more than one exon).

You can use the flattenGTF function in Rsubread to flatten your annotation before you run featureCounts

ADD COMMENT
0
Entering edit mode

Hi Wei, 

 

Thank you so much for your quick response! I really appreciate the clear explanation. May I ask one quick follow up question please.. Given that reads span exon junctions, looking at the annotation slot, similar start sites have different site ends... So how does one tackle this to produce a total number of counts for the correct amount of exons in the gene.. Is it to merge the duplicates?

 

Or is this specifically what flattenGTF will do by merging non-overlapping exon bins? If so, do I need to pass the flattened GTF file which is a GFF? because passing the GTF path after flattening GTF produces the same result with duplicated exons...

 

Thank you!!

ADD REPLY
0
Entering edit mode

I would recommend you to merge the duplicated or overlapping exons. This is what flattenGTF function does. Output of this function is a data.frame object that contains a SAF format annotation. Take a look at the help page for flattenGTF for more information. You can then feed the data.frame object to featureCounts for counting.

Or you may directly use the inbuilt annotation for GRCm38/mm10 in featureCounts. The inbuilt annotation has all overlapping/duplicated exons merged. By default, featureCounts uses the inbuilt mm10 annotation to count reads (annot.inbuilt = "mm10"). 

ADD REPLY
0
Entering edit mode

thank you so much for your assistance! Really appreciate this, it has been so helpful!

ADD REPLY

Login before adding your answer.

Traffic: 765 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6