There is no way to specify the source of gene symbols for an OrgDb
. For TEC, one comes from HGNC, and the other comes from OMIM. When we generate the OrgDb
packages, we don't distinguish between sources, as they are all (as far as NCBI is concerned) 'real' gene symbols. Unfortunately, gene symbols are not unique, and come from different sources (and get retired regularly), so one would ideally not use them for anything but presenting data to a biologist, for whom the gene symbol is usually the primary ID.
The easy way to get around this is to use mapIds
instead.
> z <- mapIds(org.Hs.eg.db, c("TEC", "MEMO1"), "ENTREZID","SYMBOL")
'select()' returned 1:many mapping between keys and columns
> data.frame(ENTREZID = z, SYMBOL = names(z))
ENTREZID SYMBOL
TEC 7006 TEC
MEMO1 7795 MEMO1
But do note this is a naive implementation that simply chooses the first choice for each symbol
> mapIds(org.Hs.eg.db, c("TEC", "MEMO1"), "ENTREZID","SYMBOL", multiVals = "list")
'select()' returned 1:many mapping between keys and columns
$TEC
[1] "7006" "100124696"
$MEMO1
[1] "7795" "51072"
ok I understand the trouble here, when I look up for the ENTREZID that I get in the output, I can see that both of the ENTREZID retrieve two different genes, with the same gene symbol for them in NCBI. However, one is approved by the HGNC, and the other is not approved. How can I tell AnnotationDbi to consider my gene symbols as the once approved by HGNC when I retrieve the data for ENTREZID? it is much clear if you look for MEMO1 in NCBI.
Bottom line is, is there a way to specify gene symbols as HGNC gene symbols in AnnotationDbi??