Hello,
I have just encountered an issue trying to get NCBI gene IDs (entrezgene_id
) from Ensembl version 107 using biomaRt
. It looks like there could be something missing from the version 107 archive related to these IDs: ensembl_mart_107.hsapiens_gene_ensembl__ox_entrezgene__dm
.
I did not encounter a problem just looking at gene names in version 107 or when getting NCBI gene IDs from the new version 108.
Best regards,
Jamie.
library(biomaRt)
mart <- useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl", version = 107)
query <- c("ENSG00000075624", "ENSG00000111640", "ENSG00000165704")
biomaRt.annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
filters = "ensembl_gene_id", values = query, mart = mart)
biomaRt.annotation
# ensembl_gene_id external_gene_name
# 1 ENSG00000075624 ACTB
# 2 ENSG00000111640 GAPDH
# 3 ENSG00000165704 HPRT1
biomaRt.annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "entrezgene_id"),
filters = "ensembl_gene_id", values = query, mart = mart)
# Error in .processResults(postRes, mart = mart, hostURLsep = sep, fullXmlQuery = fullXmlQuery, :
# Query ERROR: caught BioMart::Exception::Database: Error during query execution: Table 'ensembl_mart_107.hsapiens_gene_ensembl__ox_entrezgene__dm' doesn't exist
mart <- useEnsembl("ensembl", dataset = "hsapiens_gene_ensembl")
biomaRt.annotation <- getBM(attributes = c("ensembl_gene_id", "external_gene_name", "entrezgene_id"),
filters = "ensembl_gene_id", values = query, mart = mart)
biomaRt.annotation
# ensembl_gene_id external_gene_name entrezgene_id
# 1 ENSG00000075624 ACTB 60
# 2 ENSG00000111640 GAPDH 2597
# 3 ENSG00000165704 HPRT1 3251
sessionInfo()
# R version 4.1.2 (2021-11-01)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Catalina 10.15.7
#
# Matrix products: default
# BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
#
# locale:
# [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods base
#
# other attached packages:
# [1] biomaRt_2.50.1
#
# loaded via a namespace (and not attached):
# [1] KEGGREST_1.34.0 progress_1.2.2 tidyselect_1.1.1 purrr_0.3.4
# [5] vctrs_0.3.8 generics_0.1.1 stats4_4.1.2 BiocFileCache_2.2.0
# [9] utf8_1.2.2 blob_1.2.2 XML_3.99-0.8 rlang_0.4.12
# [13] pillar_1.6.4 glue_1.6.0 withr_2.4.3 DBI_1.1.1
# [17] rappdirs_0.3.3 BiocGenerics_0.40.0 bit64_4.0.5 dbplyr_2.1.1
# [21] GenomeInfoDbData_1.2.7 lifecycle_1.0.1 stringr_1.4.0 zlibbioc_1.40.0
# [25] Biostrings_2.62.0 memoise_2.0.1 Biobase_2.54.0 IRanges_2.28.0
# [29] fastmap_1.1.0 GenomeInfoDb_1.30.0 curl_4.3.2 AnnotationDbi_1.56.2
# [33] fansi_0.5.0 Rcpp_1.0.7 filelock_1.0.2 cachem_1.0.6
# [37] S4Vectors_0.32.3 XVector_0.34.0 bit_4.0.4 hms_1.1.1
# [41] png_0.1-7 digest_0.6.29 stringi_1.7.6 dplyr_1.0.7
# [45] tools_4.1.2 bitops_1.0-7 magrittr_2.0.1 RCurl_1.98-1.5
# [49] RSQLite_2.2.9 tibble_3.1.6 crayon_1.4.2 pkgconfig_2.0.3
# [53] ellipsis_0.3.2 xml2_1.3.3 prettyunits_1.1.1 assertthat_0.2.1
# [57] httr_1.4.2 R6_2.5.1 compiler_4.1.2
Thanks James. The same problem does indeed occur accessing the NCBI gene ID table using BioMart from the 107 archive website. However, currently the relevant table is actually accessible via FTP.
I have contacted the Ensembl help desk directly.