Ideally the database wouldn't have the version postfix for the accession numbers. In general we strip that off, because it's not really necessary for annotating things (e.g., the changes between say EGW048382.1 and EGW048382.2 won't change what gene we are talking about, etc).
Anyway, the easiest thing to do is just make a data.frame that has the (postfix stripped) accession numbers in one column, and the gene symbols in the other, and use match
to match them up.
> library(AnnotationHub)
> hub <- AnnotationHub()
snapshotDate(): 2016-03-09
> z <- hub[["AH48061"]]
> mapper <- select(z, keys(z), c("ACCNUM","SYMBOL"))
'select()' returned 1:many mapping between keys and columns
> mapper$ACCNUM <- gsub("\\.[1-9]", "", as.character(mapper$ACCNUM))
> head(mapper)
GID ACCNUM SYMBOL
1 3979178 ABD49734 ND4L
2 3979178 ACC86255 ND4L
3 3979178 YP_537127 ND4L
4 3979179 ABD49735 ND4
5 3979179 YP_537128 ND4
6 3979180 ABD49736 ND5
Then say you have a set of accession numbers (here I fake some up)
> accnum <- mapper$ACCNUM[sample(1:5000, 30)]
> accnum
[1] "JP059326" "JI889453" "EGW08349" "AAA74140" "NP_001230979"
[6] "AAD30976" "NM_001246717" "XP_007646054" "EGW01067" "XP_007646591"
[11] "AAL57738" "XP_007645054" "EGW08308" "XM_007628285" "FN825776"
[16] "ABQ85432" "NM_001246755" "XP_007621204" "XM_007645844" "XP_007639173"
[21] "BAA34652" "BAA88319" "JI869646" "JP056468" "XM_007641315"
[26] "XM_003514799" "XP_007622787" "XM_007653487" "NP_001233694" "NP_001233637"
> mapped <- mapper[match(accnum, mapper$ACCNUM),]
> mapped
GID ACCNUM SYMBOL
3426 100689473 JP059326 Hspd1
2156 100689312 JI889453 Gosr1
4551 100750715 EGW08349 Ints7
321 100689017 AAA74140 Srebf2
800 100689064 NP_001230979 Ldha
627 100689049 AAD30976 Mpdu1
3316 100689459 NM_001246717 Cenpa
1803 100689245 XP_007646054 Pparg
490 100689036 EGW01067 Fut9
3497 100736552 XP_007646591 Scarb1
239 100689008 AAL57738 LOC100689008
1203 100689177 XP_007645054 Pam16
4900 100750819 EGW08308 Dnm1l
3834 100750426 XM_007628285 Btrc
470 100689031 FN825776 Slc35a2
1832 100689247 ABQ85432 Cnbp
2223 100689322 NM_001246755 Slc35a1
697 100689055 XP_007621204 Lrrfip1
4796 100750781 XM_007645844 Mtmr3
3617 100750381 XP_007639173 LOC100750381
481 100689033 BAA34652 Cyp2a14
2681 100689377 BAA88319 Ercc1
833 100689069 JI869646 Gnao1
3132 100689432 JP056468 Ugcg
2954 100689407 XM_007641315 Slc19a1
4663 100750748 XM_003514799 Arfgef1
1083 100689099 XP_007622787 Vim
4620 100750728 XM_007653487 LOC100750728
2303 100689332 NP_001233694 Prdx1
3250 100689449 NP_001233637 Pgs1
That is a great solution! Thank you very much!
Little improvement
Instead of "match" I used the "merge" function to get the GID and SYMBOL next to the values of the micorarray data. Therefore, I first had to rename the microarray genbank ID column to the same name used in the annotation DB ("ACCNUM"), so that "merge" could work.