I did some digging in the code that I use to make these org packages and I traced back the origin of this particular mapping data back to ensembl. We primarily use NCBI for our entrez gene based packages, but so many people need to use ensembl data that we also supplement the mapping of our ensembl IDs with data directly from ensembl. Ensembl is used for connecting ensembl transcript, ensembl protein and ensembl gene ids to the respective entrez gene IDs). You can see for yourself how this would have happened with this code here:
library(biomaRt)
## get the mart
mart = useMart('ensembl',dataset='hsapiens_gene_ensembl')
## get all of the data from ensembl for entrez gene ids to ensembl ids
egdata = getBM(c("ensembl_gene_id","entrezgene"), mart = mart)
## Now if you look at the data for id ENSG00000184009
## You can see that the data matches two separate
## (and different) entrez gene ids.
egdata[egdata$ensembl_gene_id=='ENSG00000184009',]
So that pointed to data coming from ensembl as the source for this strange result. But ensembl is a highly reliable data source (this is why we use them to help build our ensembl to entrez gene ids). So I contacted them to ask what was happening and they pointed me here. And if you scroll down you can see that there are in fact entrez gene IDs for both ACTB and ACTG1. So how did that happen?
Thomas Maurel, patiently explained the following to me in a correspondence when I asked him about it. I felt his explanation was very good (and also that he deserves credit for doing this part of the investigation), so I am re-posting that part of his response here:
"Our cross reference mapping system is quite complex but as a general rule an Ensembl Gene, Transcript or Translation ID can be linked to multiple external ids from the same source.
All the EntrezGene ids are imported into Ensembl via RefSeq mappings. For this example, we see that all RefSeq mappings we have for this gene (via transcripts and translations) correspond to ACTB, apart from one.
The predicted transcript, XM_006722048.1, aligns against one of the transcripts and corresponds to ACTG1, according to NCBI annotation.
This can also be verified using the website:
http://www.ensembl.org/Homo_sapiens/Share/2b2a0c24d24821539cade73652314960162085327
If you look at the bottom, all the mapped RefSeq sequences, you can see which gene they correspond to when clicking on the individual links.
XM_006722048.1 aligns correctly against the Ensembl transcript ENST00000573283, and only this one.
It seems that the HGNC name agrees with the name we would get via the predicted sequence, and not the curated RefSeq entries. The curated RefSeq entry is mapped via overlap. On the following link
http://www.ensembl.org/Homo_sapiens/Share/95e7c9b5184b4971ae83e302e7b088b4162085327, you can see that 5 RefSeq entries, all corresponding to ACTB, overlap our Ensembl gene.
To conclude, until the various resources (here, HGNC and RefSeq), agree on a same name, we can only do our best and display all the information we have available."
Anyhow that seems like a pretty complete explanation of the current circumstance to me. I hope that you are satisfied with it. But if not, you now know who to try and contact (NCBI's refseq resouces) about the apparent discrepancy.
Marc
This appears to be a recent phenomenon:
But in a more recent version of BioC: