You can always look on the AnnotationHub
.
> library(AnnotationHub)
> hub <- AnnotationHub()
snapshotDate(): 2019-05-02
> query(hub, c("clostridium","orgdb"))
AnnotationHub with 4 records
# snapshotDate(): 2019-05-02
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Clostridium beijerinckii, Clostridium bolteae_90A9, Clostridium ...
# $rdataclass: OrgDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH73748"]]'
title
AH73748 | org.Clostridium_bolteae_90A9.eg.sqlite
AH73749 | org.[Clostridium]_bolteae_90A9.eg.sqlite
AH73787 | org.Clostridium_beijerinckii.eg.sqlite
AH73788 | org.Clostridium_rubrum.eg.sqlite
> orgdb <- hub[["AH73787"]]
downloading 1 resources
retrieving 1 resource
|======================================================================| 100%
loading from cache
AH73787 : 80533
> orgdb
OrgDb object:
| DBSCHEMAVERSION: 2.1
| DBSCHEMA: NOSCHEMA_DB
| ORGANISM: Clostridium beijerinckii
| SPECIES: Clostridium beijerinckii
| CENTRALID: GID
| Taxonomy ID: 1520
| Db type: OrgDb
| Supporting package: AnnotationDbi
Please see: help('select') for usage information
> select(orgdb, head(keys(orgdb)), c("SYMBOL","ENTREZID"))
'select()' returned 1:1 mapping between keys and columns
GID SYMBOL ENTREZID
1 31662982 LF65_RS10675 31662982
2 5329690 NEWENTRY 5329690
3 31660858 LF65_RS00005 31660858
4 31660859 LF65_RS00010 31660859
5 31660860 LF65_RS00015 31660860
6 31660861 LF65_RS00020 31660861
NCBI Gene ID is the same thing as an EntrezGene ID (or an entrez ID for that matter). It's been years since NCBI dropped the Entrez part, but habits die hard I suppose.
You should read the AnnotationDbi vignette and the AnnotationHub vignette and the AnnotationHub HOWTO, at the very least, which should help you get started.
Hi James,
Thanks for your input. Appreciate it. So what I understand from your answer is that I would have to install bioconductor or R software to be able to run the annotationhub package right? You have run this package and the code above is for R software right? I intend to download the entrezGene ID number for the whole genome annotations in the form of a table or excel sheet, I hope this can be done using this package. I will try to learn this and get back to you with more specific questions questions.
Best
This support site is intended to help people with technical issues using Bioconductor software, all of which are R packages. If you want to do things other ways, then you are in the wrong place. You could probably try Biostars?
User did indeed go to Biostars: https://www.biostars.org/p/402051/