It seems to me that using annotate::lookUp
used to work for mapping ENSEMBL genes to ENTREZ IDs. I used to use it in the following way:
annotate::lookUp("ENSG00000121410", data = "org.Hs.eg", what = "ENSEMBL")
However, it seems that it doesn't work anymore for me even though the mapping is definitely available.
Verify mapping availability:
> temp <- toTable(org.Hs.egENSEMBL)
> head(temp)
gene_id ensembl_id
1 1 ENSG00000121410
2 2 ENSG00000175899
3 3 ENSG00000256069
4 9 ENSG00000171428
5 10 ENSG00000156006
6 12 ENSG00000196136
Take the first ENSEMBL gene from temp
to use with annotate::lookUp
(it should return 1, however, it returns NA):
> annotate::lookUp("ENSG00000121410", data = "org.Hs.eg", what = "ENSEMBL")
$ENSG00000121410
[1] NA
Lookup by ENTREZ actually works:
> annotate::lookUp("1", data = "org.Hs.eg", what = "ENSEMBL")
$`1`
[1] "ENSG00000121410"
Look up by Gene Symbol to get ENTREZ IDs using ALIAS2EG works!
> annotate::lookUp("STPG1", data = "org.Hs.eg", what = "ALIAS2EG")
$STPG1
[1] "90529"
I can obtain mapping from ENSEMBL to ENTREZ using mapIds
:
> AnnotationDbi::mapIds(org.Hs.eg.db, key = "ENSG00000121410", column = "ENTREZID", keytype = "ENSEMBL")
'select()' returned 1:1 mapping between keys and columns
ENSG00000121410
"1"
What am I doing wrong? I have built a package that relies on annotate::lookUp
and it works for gene symbols, but not for ensembl genes.
Thank you very much for your time and help!
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Hs.eg.db_3.8.2 annotate_1.62.0 XML_3.98-1.19 AnnotationDbi_1.46.0
[5] IRanges_2.18.0 S4Vectors_0.22.0 Biobase_2.44.0 BiocGenerics_0.30.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.1 digest_0.6.18 bitops_1.0-6 xtable_1.8-4 DBI_1.0.0 RSQLite_2.1.1
[7] blob_1.1.1 tools_3.6.0 bit64_0.9-7 RCurl_1.95-4.12 bit_1.1-14 compiler_3.6.0
[13] pkgconfig_2.0.2 memoise_1.1.0
Thank you, James! You are right, this does work:
but I thought that I was supposed to use the string from the package, like here:
org.Hs.egENSEMBL
, so I would take "ENSEMBL"...Anyways, not arguing! :) I will take your advice and convert my code to
mapIds
. Thank you very much!Unless you are adding an answer, please use the ADD COMMENT or ADD REPLY buttons.
Most of the BiMaps that are available in a package are central key -> other thing. So org.Hs.egENSEMBL is a mapping from Gene ID (the central key) -> Ensembl ID (the other thing). There is a function called
revmap
that reverses that mapping:Anything with a 2EG at the end of its name is a pre-formed
revmap
of an existing BiMap that you could hypothetically use.But it seems to me that it's just way easier (programmatically) to get mappings using
select
ormapIds
rather than trying to figure out which BiMap you need for a given mapping.