biomaRt has encountered an unknown server error.
2
0
Entering edit mode
Amparo • 0
@73eda324
Last seen 10 months ago
United Kingdom

Hi there!

I am trying to use the package biomaRt to find the Ensambl ID for my genes that are in HGNC. I am doing an iteration because some genes do not have an Ensamble ID apparently, so I wanted to check them per row. However, all the time I keep getting an error about the server. I have tried different mirrors (www, asia and useast), but none of them seem to work. My code looks fine, but maybe I am unable to spot the error and that is why it is not working?

mart <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl", host = "https://www.ensembl.org")

all_migraine$ensembl_id <- NA

for (i in 1:nrow(all_migraine)) {
  ensembl_matrix <- getBM(attributes = 'ensembl_gene_id', 
                          filters = 'hgnc_symbol', 
                          values = all_migraine$mappedGenes[i], 
                          mart = mart)
  ensembl_id <- ensembl_matrix[1, 1]
  all_migraine$ensembl_id[i] <- ifelse(!is.na(ensembl_id), ensembl_id, NA)
}

Error: biomaRt has encountered an unknown server error. HTTP error code: 405
Please report this on the Bioconductor support site at https://support.bioconductor.org/
Consider trying one of the Ensembl mirrors (for more details look at ?useEnsembl)

Session info:
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

I am new to bioinformatics, so any suggestion on what may be the issue would totally help! Thanks a bunch!

biomaRt host useEnsembl • 1.8k views
ADD COMMENT
2
Entering edit mode
@james-w-macdonald-5106
Last seen 24 minutes ago
United States

You should not query the Biomart server (or any server) like that. Querying a server in a tight loop like that is considered bad behavior (looks like a DNS attack) and might get your IP banned. Instead do the query, and include the hgnc_symbol in your attributes as well. Then you will get all your HUGO symbols back with a NA value if there's no Ensembl ID.

ADD COMMENT
1
Entering edit mode

I should also point out that the Biomart server can be a bit temperamental, and it's difficult enough some times to connect once. Trying to repeatedly connect will only exacerbate that issue.

ADD REPLY
1
Entering edit mode

Yes, the Biomart service seems to be really bad this week and it's affecting all the mirror sites too. Unfortunately there's nothing biomaRt can do if Ensembl's servers aren't working very well.

ADD REPLY
1
Entering edit mode

As James says, you really don't want to be querying Ensembl BioMart one gene at a time. It will take forever and is very prone to failure and/or getting your IP banned. I'm not even sure how it's managed to return a 405 error, that indicates trying to access the server with a method they don't support, which biomaRt shouldn't be able to do.

One thing I will point out from James' answer is that if you query for an HGNC symbol that doesn't have a matching Ensembl ID it won't return an NA - it will just be dropped silently. This is because Ensembl BioMart is totally centred around the Ensembl IDs. Again, as James pointed out, you want to make sure you're returning both the query column and the thing you're looking for. You can then use this to make sure you can also identify the HGNC symbols that don't have a match. Here's a small example.

library(biomaRt)

## symbols we're interested in.  
## One of these doesn't have a matching Ensembl ID
hgnc_symbols <- c("BRCA2", "FOOBAA")

## perform the seach and return both Ensembl and HGNC IDs
mart <- useEnsembl(biomart = "genes", dataset = "hsapiens_gene_ensembl")
ensembl_matrix <- getBM(attributes = c('ensembl_gene_id', 'hgnc_symbol'), 
                          filters = 'hgnc_symbol', 
                          values = hgnc_symbols, 
                          mart = mart)

## The FOOBAA result has been dropped here as there was no match
ensembl_matrix
#>   ensembl_gene_id hgnc_symbol
#> 1 ENSG00000139618       BRCA2

## merge the original search list and the ensembl results
## keep all elements from the original and insert NAs if needed for Ensembl ID
merge(hgnc_symbols, ensembl_matrix,
      by.x = "x", by.y = "hgnc_symbol",
      all.x = TRUE, all.y = FALSE)

#>        x ensembl_gene_id
#> 1  BRCA2 ENSG00000139618
#> 2 FOOBAA            <NA>
ADD REPLY
0
Entering edit mode

Thank you everyone! Those are really good suggestions, and now I know a little more. I will try what you've recommended and see if I can make it work!

ADD REPLY
0
Entering edit mode

You can always use an OrgDb to do the mapping as well, which will not have an issue with access to an online resource. However there are still trade-offs. The OrgDb packages are NCBI-centric, which means any mapping of HGNC symbol to Ensembl Gene ID will actually be HGNC -> NCBI Gene ID -> Ensembl Gene ID. It's the last step that can be problematic as mapping between NCBI and Ensembl IDs is not necessarily consistent. But it is an available resource.

> library(org.Hs.eg.db)
> symb <- head(keys(org.Hs.eg.db, "SYMBOL"), 20)
> symb
 [1] "A1BG"     "A2M"      "A2MP1"   
 [4] "NAT1"     "NAT2"     "NATP"    
 [7] "SERPINA3" "AADAC"    "AAMP"    
[10] "AANAT"    "AARS1"    "AAVS1"   
[13] "ABAT"     "ABCA1"    "ABCA2"   
[16] "ABCA3"    "ABCB7"    "ABCF1"   
[19] "ABCA4"    "ABL1"    

## map assuming you really do have current symbols

> select(org.Hs.eg.db, symb, "ENSEMBL", "SYMBOL")
'select()' returned 1:many
mapping between keys and columns
     SYMBOL         ENSEMBL
1      A1BG ENSG00000121410
2       A2M ENSG00000175899
3     A2MP1 ENSG00000291190
4      NAT1 ENSG00000171428
5      NAT2 ENSG00000156006
6      NATP            <NA>
7  SERPINA3 ENSG00000196136
8     AADAC ENSG00000114771
9      AAMP ENSG00000127837
10    AANAT ENSG00000129673
11    AARS1 ENSG00000090861
12    AAVS1            <NA>
13     ABAT ENSG00000183044
14    ABCA1 ENSG00000165029
15    ABCA2 ENSG00000107331
16    ABCA3 ENSG00000167972
17    ABCB7 ENSG00000131269
18    ABCF1 ENSG00000204574
19    ABCF1 ENSG00000236149
20    ABCF1 ENSG00000231129
21    ABCF1 ENSG00000232169
22    ABCF1 ENSG00000236342
23    ABCF1 ENSG00000225989
24    ABCF1 ENSG00000206490
25    ABCA4 ENSG00000198691
26     ABL1 ENSG00000097007

## or possibly safer, assume some might be old symbols
> select(org.Hs.eg.db, symb, "ENSEMBL", "ALIAS")
'select()' returned 1:many
mapping between keys and columns
      ALIAS         ENSEMBL
1      A1BG ENSG00000121410
2       A2M ENSG00000175899
3       A2M ENSG00000211890
4     A2MP1 ENSG00000291190
5      NAT1 ENSG00000171428
6      NAT1 ENSG00000110321
7      NAT1 ENSG00000103546
8      NAT1 ENSG00000188338
9      NAT2 ENSG00000156006
10     NAT2 ENSG00000111371
11     NATP            <NA>
12 SERPINA3 ENSG00000196136
13    AADAC ENSG00000114771
14     AAMP ENSG00000127837
15    AANAT ENSG00000129673
16    AARS1 ENSG00000090861
17    AAVS1 ENSG00000125503
18     ABAT ENSG00000183044
19    ABCA1 ENSG00000165029
20    ABCA2 ENSG00000107331
21    ABCA3 ENSG00000167972
22    ABCB7 ENSG00000131269
23    ABCF1 ENSG00000204574
24    ABCF1 ENSG00000236149
25    ABCF1 ENSG00000231129
26    ABCF1 ENSG00000232169
27    ABCF1 ENSG00000236342
28    ABCF1 ENSG00000225989
29    ABCF1 ENSG00000206490
30    ABCA4 ENSG00000198691
31     ABL1 ENSG00000097007

## and here is why it can be a bit fraught. 

> select(org.Hs.eg.db, symb, c("ENSEMBL","ENTREZID"), "SYMBOL")
'select()' returned 1:many
mapping between keys and columns
     SYMBOL         ENSEMBL ENTREZID
1      A1BG ENSG00000121410        1
2       A2M ENSG00000175899        2
3     A2MP1 ENSG00000291190        3
4      NAT1 ENSG00000171428        9
5      NAT2 ENSG00000156006       10
6      NATP            <NA>       11
7  SERPINA3 ENSG00000196136       12
8     AADAC ENSG00000114771       13
9      AAMP ENSG00000127837       14
10    AANAT ENSG00000129673       15
11    AARS1 ENSG00000090861       16
12    AAVS1            <NA>       17
13     ABAT ENSG00000183044       18
14    ABCA1 ENSG00000165029       19
15    ABCA2 ENSG00000107331       20
16    ABCA3 ENSG00000167972       21
17    ABCB7 ENSG00000131269       22
18    ABCF1 ENSG00000204574       23
19    ABCF1 ENSG00000236149       23
20    ABCF1 ENSG00000231129       23
21    ABCF1 ENSG00000232169       23
22    ABCF1 ENSG00000236342       23
23    ABCF1 ENSG00000225989       23
24    ABCF1 ENSG00000206490       23
25    ABCA4 ENSG00000198691       24
26     ABL1 ENSG00000097007       25

As an example, ABCF1 maps to a single NCBI Gene ID (23), but then it maps to 7 Ensembl Gene IDs. But if you go to genenames.org, it says the mapping is to a single Ensembl Gene ID, apparently because they have manually curated it and someone says that's the one.

ADD REPLY
1
Entering edit mode
bd2000 ▴ 30
@5d657c1d
Last seen 10 months ago
United Kingdom

It does seem like biomaRt and Ensembl have been down for a couple of weeks...

https://www.ensembl.info/2024/01/18/downtime-of-the-ensembl-search-and-asia-mirror/#comments

ADD COMMENT
0
Entering edit mode

Oh... okay, that's such a shame, if they're down that's totally not helping. Thanks for letting me know, I will keep an eye on that!

ADD REPLY
0
Entering edit mode

I had access yesterday, but not today.

ADD REPLY

Login before adding your answer.

Traffic: 726 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6