biomaRt returns all NAs for hgnc_symbol
2
0
Entering edit mode
foehn ▴ 100
@foehn-16281
Last seen 3.4 years ago
United States

Hello,

I'm trying to map mouse symbols to human, using R package biomaRt. Here is my code.

bm <- useMart(biomart = 'ensembl', dataset = "mmusculus_gene_ensembl")
> SymbolMap <- getBM(attributes = c("mgi_symbol", "hgnc_symbol", "ensembl_gene_id"), filters = "mgi_symbol", mart = bm, value = symbols)
> dim(SymbolMap)
[1] 22154     3
> head(SymbolMap)                                                     
     mgi_symbol hgnc_symbol    ensembl_gene_id
1 0610005C13Rik          NA ENSMUSG00000109644
2 0610009B22Rik          NA ENSMUSG00000007777
3 0610009L18Rik          NA ENSMUSG00000043644
4 0610010F05Rik          NA ENSMUSG00000042208
5 0610010K14Rik          NA ENSMUSG00000020831
6 0610012G03Rik          NA ENSMUSG00000107002

> sumis.na(SymbolMap[, "hgnc_symbol"]))
[1] 22154
> allis.na(SymbolMap[, "hgnc_symbol"]))
[1] TRUE
> anyis.na(SymbolMap[, "ensembl_gene_id"]))
[1] FALSE
> length(unique(SymbolMap[, "ensembl_gene_id"]))
[1] 22149

packageVersion("biomaRt")
[1] ‘2.38.0’

To my surprise, none of the mouse symbols get mapped to human. However, obviously the input mouse symbols can be mapped to 22149 Ensembl gene IDs, which means my input should be no problem. So, I'm confused by the results and want to see if anybody has similar issue. Thanks.

biomaRt • 3.3k views
ADD COMMENT
1
Entering edit mode

...and it should not be any value other than NA because HGNC is for human gene nomenclature.

ADD REPLY
0
Entering edit mode

sumis.na should be sum(is.na, and same for allis.na and anyis.na. Don't know why they are shown differently from the preview...

ADD REPLY
3
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 18 hours ago
EMBL Heidelberg

It seems you've stumbled across a combination of attributes that results in an invalid query. If you try running your same query in the Ensembl BioMart web interface you get back the following:

Validation Error: Too many attributes selected for External References

I don't know of any way for biomaRt to check for this, but I suspect whatever the issue is server-side is why you're seeing the complete set of NA values.

As for why it's happening, one possible reason is that this is a case where that attribute name is really misleading. There's very little documentation, but I think this field is only populated for poorly annotated genes that don't have an MGI symbol but have been assigned some speculative HGNC ortholog e.g. SPATA24 If that's the case then your query, which explicitly selects genes with MGI symbols, would only ever return results with no value assigned to this field - hence the NAs

I assume you actually want to find the set of orthologous human genes for your starting set of MGI symbols. If that's the case, here's one approach to finding the HGNC symbols for orthologous genes. First we'll load the library, initialise the mart, and list some example MGI gene symbols:

library(biomaRt)
symbols <- c("0610005C13Rik", "Cdc6", "Gfap")
bm <- useMart(biomart = 'ensembl', dataset = "mmusculus_gene_ensembl")

Next we get the table of mappings between MGI and Ensembl IDs:

mgi2ensembl <- getBM(attributes = c("mgi_symbol", "ensembl_gene_id"), 
    filters = "mgi_symbol", 
    mart = bm, 
    value = symbols)

We then ask for all human orthologs for those Ensembl IDs. As this is an Ensembl dataset you have to use Ensembl IDs as the primary key here.

ensembl2hgnc <- getBM(attributes = c("hsapiens_homolog_associated_gene_name", "ensembl_gene_id"), 
    filters = "ensembl_gene_id", 
    mart = bm, 
    value = mgi2ensembl$ensembl_gene_id)

Finally we merge our two results into a single table to get the final mapping. A blank value indicates no ortholog was reported in Ensembl.

> merge(mgi2ensembl, ensembl2hgnc)
     ensembl_gene_id    mgi_symbol hsapiens_homolog_associated_gene_name
1 ENSMUSG00000017499          Cdc6                                  CDC6
2 ENSMUSG00000020932          Gfap                                  GFAP
3 ENSMUSG00000109644 0610005C13Rik 

There are several ways you can do this biomaRt and I wouldn't be surprised if they came up with slightly different results as mapping between gene symbols/annotation within an organism is fraught with oddities, as is defining orthologs, but they should be broadly similar.

ADD COMMENT
0
Entering edit mode

Thanks to the detailed answer. Understood what you and @swbarnes2 mean. But it's a bit weird that the same code worked years ago...

ADD REPLY
2
Entering edit mode
swbarnes2 ★ 1.4k
@swbarnes2-14086
Last seen 49 minutes ago
San Diego

Not like this. You want to find human orthologs for those mouse symbols. You might need to go symbol -> ensembl ID -> human orthologs (which won't be 1-1)

ADD COMMENT

Login before adding your answer.

Traffic: 611 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6