Missing GenBank accession numbers in the org.Hs.egACCNUM object?
1
0
Entering edit mode
colaneri ▴ 30
@colaneri-7770
Last seen 5.6 years ago
United States

I am trying to convert GenBank accession numbers to Entrez ID or Symbol using the org.Hs.egACCNUM object. However many of the GenBank accnum in my list do not exist in the object. The description of this object says: “This object is a simple mapping of Entrez Gene identifiers https://www.ncbi.nlm.nih.gov/ entrez/query.fcgi?db=gene to all possible GenBank accession numbers”.

For example if I search for AA725246 in ncbi I found this record:

=====================

ai16b08.s1 Soares_parathyroid_tumor_NbHPA Homo sapiens cDNA clone 1342935 3- similar to contains Alu repetitive element;, mRNA sequence

410 bp expressed sequence tag.

Accession: AA725246.1GI: 2742953

====================

however if I try to find AA725246 in org.Hs.egACCNUM I can not find it. 

e.g. in R

> k <- keys(org.Hs.eg.db, keytype="ACCNUM")

>"ABF01637" %in% k

[1] TRUE

 >"AA725246" %in% k

[1] FALSE

How can I convert this kind of accession number to gene symbol?

org.hs.eg.db bioconductor • 1.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 13 hours ago
United States

The org.Hs.eg.db package is based on the Entrez Gene table, so by definition anything that doesn't have an Entrez Gene ID is invisible to that annotation package.

Put another way, as you noted, the description for the ACCNUM table is “This object is a simple mapping of Entrez Gene identifiers https://www.ncbi.nlm.nih.gov/ entrez/query.fcgi?db=gene to all possible GenBank accession numbers”. So if there isn't an Entrez Gene ID that maps to AA725246, which is just an EST that never seems to have made the leap to the big time, then the mapping can't be made.

ADD COMMENT
0
Entering edit mode

I see, thanks for the clarification. Then I guess the question is: is there any available tool to convert this accnum to gene identifiers.

Because those accnum I am having problem to convert comes from papers reporting gene expression differences measured by micro array. That means that they were used with the intention to represent a gene.

ADD REPLY

Login before adding your answer.

Traffic: 851 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6