I extracted exons from TxDb.Mmusculus.UCSC.mm9.knownGene and org.Mm.eg.db and found out in Rsubread, featureCounts misses some exons and in Genome Browser that some exons do not belong (or are not included in the analysis of featureCounts from rsubread package) to the gene that is specified by those packages.
mm9 = TxDb.Mmusculus.UCSC.mm9.knownGene
exon = exons(mm9)
exon_ranges = ranges(exon)
gene_id_exons = select(mm9, keys=as.character(exon$exon_id), columns = c("GENEID","TXNAME"), keytype = "EXONID")
colnames(gene_id_exons) = c("EXONID","ENTREZID","TXNAME")
symbol <- select(org.Mm.eg.db, keys=as.character(unique(gene_id_exons$ENTREZID)), keytype="ENTREZID",
columns="SYMBOL")
gene_id_exons = merge(gene_id_exons,symbol,all.x=T)
exon_info = data.frame(START = start(exon_ranges), END = end(exon_ranges), CHR = seqnames(exon), STRAND = strand(exon),EXONID = exon$exon_id)
exon_info = merge(exon_info,gene_id_exons,all.x=T)
> subset(exon_info, ENTREZID == 497097)
EXONID START END CHR STRAND ENTREZID TXNAME SYMBOL
14642 7584 3195985 3197398 chr1 - 497097 uc007aet.1 Xkr4
14643 7585 3203520 3205713 chr1 - 497097 uc007aet.1 Xkr4
14644 7586 3204563 3207049 chr1 - 497097 uc007aeu.1 Xkr4
14645 7587 3411783 3411982 chr1 - 497097 uc007aeu.1 Xkr4
14646 7588 3638392 3640590 chr1 - 497097 uc007aev.1 Xkr4
14647 7589 3648928 3648985 chr1 - 497097 uc007aev.1 Xkr4
14648 7590 3660633 3661579 chr1 - 497097 uc007aeu.1 Xkr4
The are 3 transcripts of gene Xkr4: uc007aet.1, uc007aeu.1 and uc007aev.1.
Genome Browser gives me following information: Mouse Gene mKIAA1889 (uc007aet.1), Mouse Gene Xkr4 (uc007aeu.1), Mouse Gene AK149000 (uc007aev.1). However, RefSeq says all 3 are Xkr4.
I do not know why but only one transcript is included in the in-built version of FeatureCounts of Rsubread package: Rsubread, featureCounts misses some exons. And why does Genome Browser show that they are three different genes?