biomaRt Ensembl gene ID to multiple HGNC symbol
1
0
Entering edit mode
foehn ▴ 100
@foehn-16281
Last seen 3.2 years ago
United States

Hi,

I'm using R package biomaRt to map Ensembl gene IDs to HGNC symbols. I find some Ensembl IDs can be mapped to multiple symbols. For example,

mart = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
getBM(attributes = c("ensembl_gene_id", "hgnc_symbol"), filters = "ensembl_gene_id", values = c("ENSG00000187510", "ENSG00000230417", "ENSG00000276085"), mart = mart)     
  ensembl_gene_id hgnc_symbol
1 ENSG00000187510    C12orf74
2 ENSG00000187510     PLEKHG7
3 ENSG00000230417   LINC00595
4 ENSG00000230417   LINC00856
5 ENSG00000276085      CCL3L1
6 ENSG00000276085      CCL3L3

> packageVersion("biomaRt")
[1] ‘2.38.0’

This is unsurprising given that we don't expect 1:1 map. However, what is confusing is that, if I query those IDs with Ensembl website, I will get unambiguously one symbol. That is,

ENSG00000187510 -> C12orf74
ENSG00000230417 -> LINC00856
ENSG00000276085 -> CCL3L1

In theory, what is behind biomaRt is just SQL query against Ensembl database online, and we should expect same results given the same version of the database. So I want to know why we get this discrepancy.

Thanks,

ensembl symbol hgnc biomart • 6.1k views
ADD COMMENT
3
Entering edit mode
@james-w-macdonald-5106
Last seen 19 hours ago
United States

When you map to the HGNC symbol, you are asking for an external reference. In other words, what symbols does the HUGO consortium say map to this Ensembl ID, which you can see here, and which include both of the symbols you get from the Biomart server.

ADD COMMENT
1
Entering edit mode

Right. But I'm curious about why http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000230417;r=10:78179185-78551355 returns LINC00856 as the Name in Summary section. Does it imply that Ensembl regards LINC00856 as a more canonical symbol than the other?

ADD REPLY
1
Entering edit mode

That's a question for EBI/EMBL, no? I'm not sure why you would think anybody at the Bioconductor support site would have any particular insight as to their thinking about what symbol is more or less canonical than any other.

ADD REPLY
0
Entering edit mode

Good idea.

ADD REPLY
3
Entering edit mode

According to Ensembl's reply, they arbitrarily pick a HGNC synonym for the summary if multiple.

ADD REPLY

Login before adding your answer.

Traffic: 712 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6