biomaRt: drerio_gene_ensembl dataset missing
3
0
Entering edit mode
@antonio-miguel-de-jesus-domingues-5182
Last seen 11 months ago
Germany

Whilst running a `RIPSeeper` analysis, I noticed that the dataset `drerio_gene_ensembl` which used to be available via `biomaRt` is not longer listed or accessible. To test this I first upgraded my `bioC` to make sure I am working with the latest version of `biomaRt` (2.34.0).


```r
biocLite("BiocUpgrade")
biocLite("BiocUpgrade")
```

I then followed the instructions in the [vignette](https://bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/doc/biomaRt.html) and connected to `ensembl`:


```r
library("biomaRt")
```

```
## Loading required package: methods
```

```r
ensembl <- useMart("ensembl")
dat <- listDatasets(ensembl)
str(dat)
```

```
## 'data.frame':    33 obs. of  3 variables:
##  $ dataset    :Class 'AsIs'  chr [1:33] "amelanoleuca_gene_ensembl" "dordii_gene_ensembl" "mpahari_gene_ensembl" "trubripes_gene_ensembl" ...
##  $ description:Class 'AsIs'  chr [1:33] "Panda genes (ailMel1)" "Kangaroo rat genes (Dord_2.0)" "Shrew mouse genes (PAHARI_EIJ_v1.1)" "Fugu genes (FUGU 4.0)" ...
##  $ version    :Class 'AsIs'  chr [1:33] "ailMel1" "Dord_2.0" "PAHARI_EIJ_v1.1" "FUGU 4.0" ...
```

```r
dim(dat)
```

```
## [1] 33  3
```

```r
dat[grepl("hsapiens", dat$dataset),]
```

```
## [1] dataset     description version    
## <0 rows> (or 0-length row.names)
```

```r
dat[grepl("drerio", dat$dataset),]
```

```
## [1] dataset     description version    
## <0 rows> (or 0-length row.names)
```

The tutorial lists 85 datasets wheres now it only retrieves 50. Weirdly, I noticed that numbers changed when I repeated this so I wrapped this in a loop and repeated the analysis several times:


```r
for (i in 1:10){
    print(paste("cycle:",i))
    ensembl <- useMart("ensembl")
    dat <- listDatasets(ensembl)
    print(dim(dat))
    print(paste("Is drerio present?", "drerio_gene_ensembl" %in% dat$dataset))
}
```

```
## [1] "cycle: 1"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 2"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 3"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 4"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 5"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 6"
## [1] 33  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 7"
## [1] 46  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 8"
## [1] 50  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 9"
## [1] 45  3
## [1] "Is drerio present? FALSE"
## [1] "cycle: 10"
## [1] 46  3
## [1] "Is drerio present? FALSE"
```

The number of datasets listed varies with almost each run. Importantly for me `drerio_gene_ensembl` was missing in all the tests except one.

This instability leads to:

- errors when using packages which depend on a connection to ensembl, for instance `RIPSeeker`.
- reproducibility errors for anyone not using one the stable datasets (I did not test which ones were always present but hsapiens appears to be always available).

I "solved" the issue by using an archive host:


```r
host <- "http://oct2016.archive.ensembl.org"
ensembl <- useMart("ensembl", host = "oct2016.archive.ensembl.org")
dat <- listDatasets(ensembl)
str(dat)
```

```
## 'data.frame':    69 obs. of  3 variables:
##  $ dataset    :Class 'AsIs'  chr [1:69] "oanatinus_gene_ensembl" "cporcellus_gene_ensembl" "gaculeatus_gene_ensembl" "itridecemlineatus_gene_ensembl" ...
##  $ description:Class 'AsIs'  chr [1:69] "Ornithorhynchus anatinus genes (OANA5)" "Cavia porcellus genes (cavPor3)" "Gasterosteus aculeatus genes (BROADS1)" "Ictidomys tridecemlineatus genes (spetri2)" ...
##  $ version    :Class 'AsIs'  chr [1:69] "OANA5" "cavPor3" "BROADS1" "spetri2" ...
```

```r
dim(dat)
```

```
## [1] 69  3
```

```r
dat[grepl("drerio", dat$dataset),]
```

```
##                dataset                description version
## 40 drerio_gene_ensembl Danio rerio genes (GRCz10)  GRCz10
```

but using older annotations is a bit of an hack. Has anything changed recently in ensembl or `biomaRt` that explains the missing dataset and this instability?


```r
sessionInfo()
```

```
## R version 3.4.2 (2017-09-28)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
##
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
##
## attached base packages:
## [1] methods   stats     graphics  grDevices utils     datasets  base     
##
## other attached packages:
## [1] biomaRt_2.34.0 knitr_1.17    
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.14         AnnotationDbi_1.40.0 magrittr_1.5        
##  [4] BiocGenerics_0.24.0  progress_1.1.2       IRanges_2.12.0      
##  [7] bit_1.1-12           R6_2.2.2             rlang_0.1.4         
## [10] stringr_1.2.0        blob_1.1.0           tools_3.4.2         
## [13] parallel_3.4.2       Biobase_2.38.0       DBI_0.7             
## [16] assertthat_0.2.0     bit64_0.9-7          digest_0.6.12       
## [19] tibble_1.3.4         S4Vectors_0.16.0     bitops_1.0-6        
## [22] RCurl_1.95-4.8       memoise_1.1.0        RSQLite_2.0         
## [25] evaluate_0.10.1      stringi_1.1.6        compiler_3.4.2      
## [28] prettyunits_1.0.2    stats4_3.4.2         XML_3.98-1.9
```

 

biomart ensembl ripseeker • 3.5k views
ADD COMMENT
0
Entering edit mode

@moderators: I could not format the post properly due to an error:

Language "fr" is not one of the supported languages ['en']!

Post was copy-pasted from a markdown document generated via knitr, so no idea.

 

0
Entering edit mode

You could instead of using an old archived version go for version 90 from August (http://Aug2017.archive.ensembl.org), which in many cases will have limited differences to the very latest release

ADD REPLY
1
Entering edit mode

Good tip. In my case it was a little lazy because I am also using the script to run some C. elegans data analysis and this will work for both - RIPSeeker needs biomaRt/ensembl so I need an archive version before the move to Wormbase. Another hack.

3
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 2 hours ago
EMBL Heidelberg

There was an issue with one of the new primate datasets having an apostrophe in its description, which was causing listDatasets() to fail. I have patched this in version 2.35.1 and pushed it to the Bioconductor devel branch. This will take a few days to propagate, so the fastest way to get hold if it is via Github using

BiocInstaller::biocLite('grimbough/biomaRt')

If people could report back if that works or not that would be very helpful, and assuming it works I will also patch the release version of biomaRt.


library(biomaRt)
packageVersion("biomaRt")
[1] ‘2.35.1’
ensembl_mart <- useMart("ensembl")
dim( listDatasets(ensembl_mart) )
[1] 97  3
ADD COMMENT
0
Entering edit mode
> library(biomaRt)

> packageVersion("biomaRt")
[1] ‘2.35.2’

## [1] 97  3
## [1] "Is drerio present? TRUE"

All systems are go :) Thank you for the fix.

2
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 2 hours ago
EMBL Heidelberg

Thanks for the report, I can confirm that I'm seeing the behaviour too.  I suspect this isn't a problem with the biomaRt package, but is related to the latest release of Ensembl, which is happening today (http://www.ensembl.info/blog/2017/12/12/ensembl-91-has-been-released/) and it will be back to normal in a few hours.  I'll keep an eye on it to make sure.

ADD COMMENT
0
Entering edit mode

The same issue was happening also yesterday. It is still likely due to the update at ensembl, but it is suboptimal that the default behaviour of useMart() uses an not yet complete instance and does in addition not generate any warnings

 

ADD REPLY
2
Entering edit mode
Thomas Maurel ▴ 800
@thomas-maurel-5295
Last seen 23 months ago
United Kingdom

There seems to be an issue with the new Ensembl gene mart 91 and the BiomaRt module. We are investigating with Mike Smith.

Apologies for any inconvenience caused.

ADD COMMENT

Login before adding your answer.

Traffic: 497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6