biomaRt error: Query ERROR: caught BioMart::Exception: non-BioMart die():
2
0
Entering edit mode
Georg Otto ▴ 120
@georg-otto-6333
Last seen 6.0 years ago
United Kingdom

Hi,

I have a vector with Ensembl gene IDs

> headgene.id)
[1] "ENSG00000223972" "ENSG00000227232" "ENSG00000278267" "ENSG00000243485"
[5] "ENSG00000274890" "ENSG00000237613"

I am trying to annotate the IDs using biomaRt

> library(biomaRt)

> ensembl <- useMart("ENSEMBL_MART_ENSEMBL",
                   dataset = "hsapiens_gene_ensembl",
                   host = "www.ensembl.org")

However, I get an error. The curious thing is that I get this error only when my vector has length 993 or longer, never when it is shorter, using a random selection of IDs. So this always works:

> mat.cpm.annot <- biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol", "description"), filter = "ensembl_gene_id", samplegene.id, 992), mart = ensembl, uniqueRows = TRUE)

And this gives me an error:

> mat.cpm.annot <- biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol", "description"), filter = "ensembl_gene_id", samplegene.id, 993), mart = ensembl, uniqueRows = TRUE)

Error in biomaRt::getBM(attributes = c("ensembl_gene_id", "hgnc_id", "hgnc_symbol",  :
  Query ERROR: caught BioMart::Exception: non-BioMart die():
not well-formed (invalid token) at line 1, column 16292, byte 16292 at /nfs/public/release/ensweb-software/sharedsw/2017_04_03/linuxbrew/Cellar/perl/5.24.1/lib/perl5/site_perl/5.24.1/x86_64-linux-thread-multi/XML/Parser.pm line 187.
XML::Simple called at /nfs/public/release/ensweb/latest/live/mart/www_90/biomart-perl/lib/BioMart/Query.pm line 1935.

 

> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.5 (Final)

Matrix products: default
BLAS: /share/apps/cto/packages/R/3.4.2/lib64/R/lib/libRblas.so
LAPACK: /share/apps/cto/packages/R/3.4.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.32.1

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.13         IRanges_2.10.5       XML_3.98-1.9        
 [4] digest_0.6.12        bitops_1.0-6         DBI_0.7             
 [7] stats4_3.4.2         RSQLite_2.0          rlang_0.1.2         
[10] blob_1.1.0           S4Vectors_0.14.7     tools_3.4.2         
[13] bit64_0.9-7          Biobase_2.36.2       RCurl_1.95-4.8      
[16] bit_1.1-12           parallel_3.4.2       compiler_3.4.2      
[19] BiocGenerics_0.22.1  AnnotationDbi_1.38.2 memoise_1.1.0       
[22] tibble_1.3.4        
 

 

Any idea, what is going on?

Cheers,

Georg

 

 

 

 

biomart • 3.0k views
ADD COMMENT
2
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

That's a new error to me!  I suspect that something is wrong with the back end database, rather than with the biomaRt package.

One thing you can try is to use one of the mirror services, to see if that is unaffected, e.g:

ensembl <- useMart("ENSEMBL_MART_ENSEMBL",
                   dataset = "hsapiens_gene_ensembl",
                   host = "asia.ensembl.org")

Alternatively, you can try the developmental version of biomaRt.  It's not recommended to run queries with more than 500 search values, and although in practice it's often fine occasionally results won't be returned, but you'll have no idea that's happened.  The devel package has a modification that breaks your query down into chunks of 500 and runs the independently and then splices the results back together.  Since your issue seems so deterministic perhaps this modification will be sufficient.  You can install using:

BiocInstaller::biocLite('grimbough/biomaRt')

A quick test for me suggests the uniqueRows argument won't work properly at the moment, but you can do it in post processing yourself.

ADD COMMENT
0
Entering edit mode

Those answers are still valid, but I want to add that I don't experience the problem you're seeing, so maybe it has already been fixed at the Ensembl side.

ADD REPLY
0
Entering edit mode

Thanks a lot. I tried both suggested solutions. With the mirror service I got the same error. Installing and using the devel package however made the error go away. Just to clarify: The recommendation not to run querys with more than 500 search values relates to the devel package, not the release package, right? I routinely used biomaRt to run queries of thousands of search values.

ADD REPLY
2
Entering edit mode

The 500 values has always applied to the queries sent to BioMart, either via biomaRt or using the Ensembl web interface.  For the most part you can submit more than 500 filter values and it will be fine, but if there is a problem you won't know anything about it - it happens silently. 

This is obviously really undesirably, hence the patch.  I only commited this to the devel branch incase it broke some other functionality, but noone has reported anything, and it's now part of the new release branch that was released this week.

If you are submitting queries with thousands of gene IDs or the like you should definitely be using biomaRt version 2.33.1 or newer just to be on the safe side. 

ADD REPLY
0
Entering edit mode
Georg Otto ▴ 120
@georg-otto-6333
Last seen 6.0 years ago
United Kingdom

I can confirm upgrading bioconductor to version 3.6. solved the problem.

ADD COMMENT

Login before adding your answer.

Traffic: 713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6