getBM in loop
2
0
Entering edit mode
@tiphaine-martin-6416
Last seen 6.1 years ago
France

Hi,

Could you advise me how to run "getBM" in a loop when the connection is too long and ask to rerun until I have a results ?

I see that there is a option called curl, but I never used. So I don't know how to use it.

 

library(biomaRt)

#connexion to ENSEMBL

mart <- useMart(biomart = "ENSEMBL_MART_ENSEMBL", dataset = "hsapiens_gene_ensembl")

#Extract the list of hgcn_symbol and gene_biotype for each ensembl_gene_id in my list
hgnc <- getBM(attributes=c('ensembl_gene_id','hgnc_symbol','gene_biotype'),
                   filters = 'ensembl_gene_id', values = mat$ensembl_gene_id, mart = mart)
Batch submitting query [=====================================================--------]  88% eta: 11sError in curl::curl_fetch_memory(url, handle = handle) :
  Timeout was reached: Connection timed out after 10003 milliseconds

 

> sessionInfo()

R version 3.4.3 (2017-11-30)

Platform: x86_64-apple-darwin15.6.0 (64-bit)

Running under: macOS High Sierra 10.13.2


Matrix products: default

BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib

LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib


locale:

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8


attached base packages:

[1] stats     graphics  grDevices utils     datasets  methods   base     


other attached packages:

[1] biomaRt_2.34.1 edgeR_3.20.3   limma_3.34.5


loaded via a namespace (and not attached):

[1] Rcpp_0.12.14         AnnotationDbi_1.40.0 magrittr_1.5        

[4] BiocGenerics_0.24.0  progress_1.1.2       IRanges_2.12.0      

[7] bit_1.1-12           lattice_0.20-35      R6_2.2.2            

[10] rlang_0.1.6          httr_1.3.1           stringr_1.2.0       

[13] blob_1.1.0           tools_3.4.3          parallel_3.4.3      

[16] grid_3.4.3           Biobase_2.38.0       DBI_0.7             

[19] assertthat_0.2.0     bit64_0.9-7          digest_0.6.13       

[22] tibble_1.4.1         S4Vectors_0.16.0     bitops_1.0-6        

[25] RCurl_1.95-4.9       memoise_1.1.0        RSQLite_2.0         

[28] stringi_1.1.6        compiler_3.4.3       pillar_1.0.1        

[31] prettyunits_1.0.2    stats4_3.4.3         XML_3.98-1.9        

[34] locfit_1.5-9.1      

 

biomart ensembl curl • 2.8k views
ADD COMMENT
0
Entering edit mode

I can't reproduce this error at the moment, but I suspect the cause was introduced a few weeks ago when I made some modification necessitated by changes to Ensembl.  I'll take a closer look and try to work around it.

ADD REPLY
0
Entering edit mode
@james-w-macdonald-5106
Last seen 16 minutes ago
United States

I have never seen the message about batch submitting query, but you shouldn't need to do a loop anyway. I can get over 61,000 IDs mapped in maybe 20 seconds in one go.

> ensembl <- keys(EnsDb.Hsapiens.v79, keytype = "GENEID")
> head(ensembl)
[1] "ENSG00000000003" "ENSG00000000005" "ENSG00000000419" "ENSG00000000457"
[5] "ENSG00000000460" "ENSG00000000938"
> length(ensembl)
[1] 65774
> dat <- getBM(c('ensembl_gene_id','hgnc_symbol','gene_biotype'), "ensembl_gene_id", ensembl, mart)
> dim(dat)
[1] 61351     3
> head(dat)
  ensembl_gene_id hgnc_symbol   gene_biotype
1 ENSG00000000003      TSPAN6 protein_coding
2 ENSG00000000005        TNMD protein_coding
3 ENSG00000000419        DPM1 protein_coding
4 ENSG00000000457       SCYL3 protein_coding
5 ENSG00000000460    C1orf112 protein_coding
6 ENSG00000000938         FGR protein_coding

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base     

other attached packages:
 [1] EnsDb.Hsapiens.v79_2.99.0 ensembldb_2.2.0          
 [3] AnnotationFilter_1.2.0    GenomicFeatures_1.30.0   
 [5] GenomicRanges_1.30.0      GenomeInfoDb_1.14.0      
 [7] org.Hs.eg.db_3.5.0        AnnotationDbi_1.40.0     
 [9] IRanges_2.12.0            S4Vectors_0.16.0         
[11] Biobase_2.38.0            BiocGenerics_0.24.0      
[13] biomaRt_2.34.1           

 

ADD COMMENT
0
Entering edit mode

I made the request of more than 20,000 genes per tissue and run this request for several tissues. sorry this remark was not very clear.

I don't make a loop for each genes but by tissue. Some of these request fail. I asked to rerun and for the tissue failling and for the following tissues, manually, but I would like to have this automatically.

tiphaine

ADD REPLY
0
Entering edit mode

The batch query message is something I've added in the latest release of biomaRt.  Ensembl BioMart can have some issues if the list of filter values is large, biomaRt now internally chunks them into batches of 500 and submits them sequentially.  This can take a little longer, so it prints the progress bar so you know something's happening.

ADD REPLY
0
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 33 minutes ago
EMBL Heidelberg

I've modified the default timeout to 30 seconds instead of 10.  Without being able to run the code examples on your exact setup it's hard to know if this will be sufficient, but fingers crossed.  Please report back here if it still fails, we may need to do a more extensive diagnosis.

This update is available in biomaRt version 2.35.5.  It'll appear in the devel branch shortly, but you can get the updated version immediately using:

BiocInstaller::biocLite('grimbough/biomaRt')
ADD COMMENT

Login before adding your answer.

Traffic: 952 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6