What is NCBI Gene ID, where to find it and how to convert to entrez ID?
1
0
Entering edit mode
mnazir • 0
@mnazir-22076
Last seen 5.2 years ago

Hi,

I am new to bioinformatics and starting to learn recently. I have a question about gene ID if someone can guide me. I want to upload the RNA Seq data to Kegg Exp to draw pathway on the basis of differential expression analysis. The file requires the gene symbols and GENE ID which I don't know where to find for the microorganism I am working with i.e. Clostridium beijerinckii NCIMB 8052. I have read paper and information about it that it is a number to identify genes specifically but I am struggling with finding the source where to look for it.

Thanks a lot

annotation • 10k views
ADD COMMENT
4
Entering edit mode
@james-w-macdonald-5106
Last seen 3 days ago
United States

You can always look on the AnnotationHub.

> library(AnnotationHub)

> hub <- AnnotationHub()
snapshotDate(): 2019-05-02

> query(hub, c("clostridium","orgdb"))
AnnotationHub with 4 records
# snapshotDate(): 2019-05-02 
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Clostridium beijerinckii, Clostridium bolteae_90A9, Clostridium ...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH73748"]]' 

            title                                   
  AH73748 | org.Clostridium_bolteae_90A9.eg.sqlite  
  AH73749 | org.[Clostridium]_bolteae_90A9.eg.sqlite
  AH73787 | org.Clostridium_beijerinckii.eg.sqlite  
  AH73788 | org.Clostridium_rubrum.eg.sqlite        
> orgdb <- hub[["AH73787"]]
downloading 1 resources
retrieving 1 resource
  |======================================================================| 100%

loading from cache 
     AH73787 : 80533 

> orgdb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Clostridium beijerinckii
| SPECIES: Clostridium beijerinckii
| CENTRALID: GID
| Taxonomy ID: 1520
| Db type: OrgDb
| Supporting package: AnnotationDbi

Please see: help('select') for usage information
> select(orgdb, head(keys(orgdb)), c("SYMBOL","ENTREZID"))
'select()' returned 1:1 mapping between keys and columns
       GID       SYMBOL ENTREZID
1 31662982 LF65_RS10675 31662982
2  5329690     NEWENTRY  5329690
3 31660858 LF65_RS00005 31660858
4 31660859 LF65_RS00010 31660859
5 31660860 LF65_RS00015 31660860
6 31660861 LF65_RS00020 31660861

NCBI Gene ID is the same thing as an EntrezGene ID (or an entrez ID for that matter). It's been years since NCBI dropped the Entrez part, but habits die hard I suppose.

You should read the AnnotationDbi vignette and the AnnotationHub vignette and the AnnotationHub HOWTO, at the very least, which should help you get started.

ADD COMMENT
0
Entering edit mode

Hi James,

Thanks for your input. Appreciate it. So what I understand from your answer is that I would have to install bioconductor or R software to be able to run the annotationhub package right? You have run this package and the code above is for R software right? I intend to download the entrezGene ID number for the whole genome annotations in the form of a table or excel sheet, I hope this can be done using this package. I will try to learn this and get back to you with more specific questions questions.

Best

ADD REPLY
0
Entering edit mode

This support site is intended to help people with technical issues using Bioconductor software, all of which are R packages. If you want to do things other ways, then you are in the wrong place. You could probably try Biostars?

ADD REPLY
0
Entering edit mode

User did indeed go to Biostars: https://www.biostars.org/p/402051/

ADD REPLY

Login before adding your answer.

Traffic: 581 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6