Entering edit mode
Tony Chiang
▴
570
@tony-chiang-1769
Last seen 10.2 years ago
Hi Steffen, Sean, Wolfgang,
I have a question about the return value of the getBM() function. It
is a
data frame object, and in the examples that I have seen, usually if I
want
to map from EMBL IDs to Entrez Gene IDs, we would still also want to
map the
EMBL IDs back to the EMBL IDs so we know what has mapped to what.
Example
code to follow if my explanation is not clear:
################
library(biomaRt)
ensembl = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
filters = listFilters(ensembl)
attributes = listAttributes(ensembl)
##Here are my IDs from String
test = c("9606.ENSP00000045065", "9606.ENSP00000158762",
"9606.ENSP00000174653",
"9606.ENSP00000202967", "9606.ENSP00000204517",
"9606.ENSP00000212015",
"9606.ENSP00000220616", "9606.ENSP00000222008",
"9606.ENSP00000222390",
"9606.ENSP00000223051")
emblID = sapply(strsplit(test, "\\."), function(x) x[2])
##And the code I am using for the mapping is:
getBM(attributes=c("ensembl_peptide_id",
"entrezgene","ensembl_gene_id",
"hgnc_automatic_gene_name"), filters="ensembl_peptide_id",
values=emblID,
mart=ensembl)
##################
So I guess I have two questions: would it be a good idea to always
return
what we input in the output data frame so we would have not to have
the
redundant attribute ("ensembl_peptide_id" in my example). Also, if you
ran
the code, you will see that ENSP00000045065 did not map at all , so I
assume
that it is not a valid ensembl_peptide_id (this is a bit strange since
I am
using EMBL IDs); I also want to ask if there is some way to make that
more
transparent...maybe a row of NA values? I realize that these are not
terrible things to work around, but would it not make sense to have
this? If
not, please let me know.
Cheers,
--Tony
> sessionInfo()
R version 2.10.0 Patched (2009-10-27 r50222)
x86_64-apple-darwin9.8.0
locale:
[1] en_US.utf-8/en_US.utf-8/C/C/en_US.utf-8/en_US.utf-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_2.2.0
loaded via a namespace (and not attached):
[1] RCurl_1.2-1 XML_2.6-0
[[alternative HTML version deleted]]