Question

What is NCBI Gene ID, where to find it and how to convert to entrez ID?

0

Entering edit mode

mnazir • 0

@mnazir-22076

Last seen 5.6 years ago

Hi,

I am new to bioinformatics and starting to learn recently. I have a question about gene ID if someone can guide me. I want to upload the RNA Seq data to Kegg Exp to draw pathway on the basis of differential expression analysis. The file requires the gene symbols and GENE ID which I don't know where to find for the microorganism I am working with i.e. Clostridium beijerinckii NCIMB 8052. I have read paper and information about it that it is a number to identify genes specifically but I am struggling with finding the source where to look for it.

Thanks a lot

annotation • 13k views

ADD COMMENT • link updated 5.6 years ago by James W. MacDonald 68k • written 5.6 years ago by mnazir • 0

score 4 · Accepted Answer · 2019-10-07

You can always look on the AnnotationHub.

> library(AnnotationHub)

> hub <- AnnotationHub()
snapshotDate(): 2019-05-02

> query(hub, c("clostridium","orgdb"))
AnnotationHub with 4 records
# snapshotDate(): 2019-05-02 
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Clostridium beijerinckii, Clostridium bolteae_90A9, Clostridium ...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH73748"]]' 

            title                                   
  AH73748 | org.Clostridium_bolteae_90A9.eg.sqlite  
  AH73749 | org.[Clostridium]_bolteae_90A9.eg.sqlite
  AH73787 | org.Clostridium_beijerinckii.eg.sqlite  
  AH73788 | org.Clostridium_rubrum.eg.sqlite        
> orgdb <- hub[["AH73787"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache 
     AH73787 : 80533 

> orgdb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Clostridium beijerinckii
| SPECIES: Clostridium beijerinckii
| CENTRALID: GID
| Taxonomy ID: 1520
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information
> select(orgdb, head(keys(orgdb)), c("SYMBOL","ENTREZID"))
'select()' returned 1:1 mapping between keys and columns
       GID       SYMBOL ENTREZID
1 31662982 LF65_RS10675 31662982
2  5329690     NEWENTRY  5329690
3 31660858 LF65_RS00005 31660858
4 31660859 LF65_RS00010 31660859
5 31660860 LF65_RS00015 31660860
6 31660861 LF65_RS00020 31660861

NCBI Gene ID is the same thing as an EntrezGene ID (or an entrez ID for that matter). It's been years since NCBI dropped the Entrez part, but habits die hard I suppose.

You should read the AnnotationDbi vignette and the AnnotationHub vignette and the AnnotationHub HOWTO, at the very least, which should help you get started.