Ensembl plants at annotationHub?
3
0
Entering edit mode
Guido Hooiveld ★ 4.1k
@guido-hooiveld-2020
Last seen 3 days ago
Wageningen University, Wageningen, the …

Hi,

I am regularly using the annotationHub to retrieve/query the Ensembl-based gene annotations (ensembldb). This works fine for e.g human and mouse, but I now would like to obtain info made available through the Ensembl Plant database; specifically for Arabidopsis ( http://plants.ensembl.org/Arabidopsis_thaliana/Info/Index ).

Question: Is such ensembldb available at the annotationHub? I searched for it but could not find it...

Thanks,

Guido

ensembl ensembldb plant arabidopsis thaliana annotationhub • 7.4k views
ADD COMMENT
2
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 8 weeks ago
Italy

Dear Guido,

while it is possible to create EnsDb databases also for ensemblgenomes (including plants, funghi etc) I did not do this on a regular basis and was also hesitant to add these to AnnotationHub because I was not sure how many users there will be for these.

Just let me know which species (for which Ensembl/Ensemblgenomes) version you need and I will create the EnsDb for you.

cheers, jo

ADD COMMENT
0
Entering edit mode

Hi Johannes,

Thanks for your offer! As far as I am concerned only an EnsDb for the latest genome info for Arabidopsis would do for now. (EnsemblPlants, release 41, Sept 2018, here).

Thanks a lot for your help!

Guido

ADD REPLY
2
Entering edit mode

Also don't know if its helpful but there is a recent orgDb added to AnnotationHub for Arabidopsis matching the taxonomyid on the reference page you listed

> ah[which(mcols(ah)$taxonomyid==3702)]
AnnotationHub with 5 records
# snapshotDate(): 2018-11-01 
# $dataprovider: UCSC, Inparanoid8, ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Arabidopsis thaliana
# $rdataclass: TxDb, Inparanoid8Db, OrgDb
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH10456"]]' 

            title                                     
  AH10456 | hom.Arabidopsis_thaliana.inp8.sqlite      
  AH52245 | TxDb.Athaliana.BioMart.plantsmart22.sqlite
  AH52246 | TxDb.Athaliana.BioMart.plantsmart25.sqlite
  AH52247 | TxDb.Athaliana.BioMart.plantsmart28.sqlite
  AH66148 | org.At.tair.db.sqlite
> ah["AH66148"]
AnnotationHub with 1 record
# snapshotDate(): 2018-11-01 
# names(): AH66148
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
# $species: Arabidopsis thaliana
# $rdataclass: OrgDb
# $rdatadateadded: 2018-10-22
# $title: org.At.tair.db.sqlite
# $description: NCBI gene ID based annotations about Arabidopsis thaliana
# $taxonomyid: 3702
# $genome: NCBI genomes
# $sourcetype: NCBI/ensembl
# $sourceurl: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, ftp://ftp.ensembl.org/p...
# $sourcesize: NA
# $tags: c("NCBI", "Gene", "Annotation") 
# retrieve record with 'object[["AH66148"]]'

Cheers

ADD REPLY
0
Entering edit mode

Lori, do you think it might be usefull to add also EnsDb for all species in ensemblgenomes to AnnotationHub (starting ev "only" with plants)?
 

ADD REPLY
1
Entering edit mode

Let's further this discussion off the support site 

ADD REPLY
0
Entering edit mode

I've generated the EnsDb. You can get the file from here https://www.dropbox.com/sh/wglt28zlfzhjubs/AADzGqJ0zydKRmdqbOsH_Ru5a?dl=0

after unzipping you can simply load the sqlite file with edb <- EnsDb(<sqlite-file>)

 

ADD REPLY
0
Entering edit mode

Thanks! Meanwhile downloaded the file and everything is working fine.
 

ADD REPLY
0
Entering edit mode

Hi, I am having a similar problem. How do I access the rice data (Oryza sativa Japonica Group) made available through the Ensembl Plant database. Sorry if this is extremely obvious I am new to this, thank you in advance for any help you can provide. Sincerely Cameron

ADD REPLY
1
Entering edit mode

Dear Cameron,

I create EnsDb annotation resources for all species part of the Ensembl core databases which are then available through the AnnotationHub (see also Lori's reply). I don't create these by default for the Ensembl plants, fungi, etc databases.

It would however not be a big problem for me to create them on demand - just let me know what species and Ensembl release you need (unless the resources already available in AnnotationHub - see Lori's reply - are not already sufficient).

cheers, jo

ADD REPLY
0
Entering edit mode

Hi Johannes,

Could I request the same EnsDb creation for Medicago truncatula?

Thank you very much,

Karen

ADD REPLY
0
Entering edit mode

Hi Karen,

I've created the EnsDb (for Ensembl release 106, which corresponds to ensemblgenomes release 53). You can download the file from here. The file is called EnsDb.Mtruncatula.v106.sqlite. You can simply load this database using the EnsDb function.

cheers, jo

ADD REPLY
0
Entering edit mode

Thank you, this helped a lot! If it's not too much trouble, would it be possible to get the same for "Rhizophagus irregularis DAOM 181602=DAOM 197198 (ASM43914v3)"?

Many thanks,

Karen

ADD REPLY
0
Entering edit mode

Dear Karen,

I had a look at the Ensemblgenomes site for this fungus (here), but could not find the actual MySQL database that contains the gene, protein etc annotations. Without that I can not create the EnsDb.

In fact, for fungi, these are all available databases - could you maybe have a look through them to see if you could identify the one containing annotations for that species? I'm not familiar with fungi genus and species collections...

ADD REPLY
0
Entering edit mode

Hi Johannes, Thanks for trying... I could not find it there in those databases. It is in the list of genomes in the parent directory, but I don't see that file in any of the sql databases.

https://fungi.ensembl.org/Rhizophagus_irregularis_daom_181602_daom_197198_gca_002897155/Info/Index

Also sorry for the very delayed response :)

ADD REPLY
0
Entering edit mode

Dear Johannes, can you tell me how can I access to an EnsDb annotation of Vitis vinifera?

Many thanks,

António

ADD REPLY
0
Entering edit mode

Dear Antonio,

I can create you an EnsDb for Vitis vinifera - could you please tell me from which Ensembl (or Ensemblgenomes) release you want to have it?

thanks, jo

ADD REPLY
0
Entering edit mode

Dear Johannes,

Many thanks for your help. I want it for the last genome assemble release that is the PN40024.v4. If not possible could perfectly be the previous version v3.

best regards,

António

ADD REPLY
0
Entering edit mode

Hi Antonio,

I've created an EnsDb for Ensembl version 107 (which has PN40024.v4). You can download the sqlite file (EnsDb.Vvinifera.v107.sqlite) from here. To use this database:

> library(ensembldb)
> edb <- EnsDb("EnsDb.Vvinifera.v107.sqlite")
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.3.7
|Creation time: Mon Jul 25 08:22:47 2022
|ensembl_version: 107
|ensembl_host: localhost
|Organism: Vitis vinifera
|taxonomy_id: 29760
|genome_build: PN40024.v4
|DBSCHEMAVERSION: 2.2
| No. of genes: 35134.
| No. of transcripts: 41097.
|Protein data available.

cheers, jo

ADD REPLY
0
Entering edit mode

Many thanks,

works fine,

Best regards,

António

ADD REPLY
0
Entering edit mode

Hi Johannes,

Would it be possible to get the EnsDB creation for Trichoderma reesei (GCA_000167675.2) and Aspergillus oryzae (ASM18445v3), if it's not too much trouble?

Thank you so much!

Emmi

ADD REPLY
0
Entering edit mode

Hi Emmi,

the two EnsDbs are now also available in this folder (EnsDb.Aoryzae.v111.sqlite and EnsDb.Treesei.v111.sqlite).

Best, jo

ADD REPLY
0
Entering edit mode

Thank you! This helps a lot.

Emmi

ADD REPLY
0
Entering edit mode

I was wondering, is it possible to add data to the EnsDbs afterwards or would you have to start from the beginning with the files? We have some Gene Ontology data for the organisms generated with Blast2GO which would be a nice addition and I was wondering if I could somehow add that.

Thank you for your answer!

Emmi

ADD REPLY
0
Entering edit mode

Sorry, but there is no option to add additional (external) data to the EnsDb databases - they are built from Ensembl annotations and by design contain only these annotations. Depending on the need or use case, a workaround could maybe also be to extract annotations from the EnsDb as a GRanges object and then add additional information to that?

ADD REPLY
0
Entering edit mode

Thank you for your reply. Extracting annotations and adding more information could work in some cases, so I will try that.

ADD REPLY
0
Entering edit mode

Dear Johannes,

Would it be possible to create the EnsDb for Triticum aestivum (IWGSC)?

Thank you very much,

Daniele

ADD REPLY
1
Entering edit mode

Dear Daniele,

for which Ensembl release would you need the data? the most recent is 112, but if you used a different version before it would be good to know what release you need. Actually, even better than the Ensembl release would be the version of ensemblgenomes since both use different version numbers...

cheers, jo

ADD REPLY
0
Entering edit mode

Hi,

I'm using Ensembl Plants release 59 with the IWGSC RefSeq v1.1 gene annotation.

Thanks a lot for replying so quickly!

ADD REPLY
1
Entering edit mode

You can get the EnsDb EnsDb.Taestivum.v112.sqlite (for Ensembl 112/ ensemblgenomes 59) here

ADD REPLY
0
Entering edit mode

Thank you so much!

ADD REPLY
0
Entering edit mode
> ah = AnnotationHub()
snapshotDate(): 2020-12-19

> query(ah, "Oryza sativa")
AnnotationHub with 4 records
# snapshotDate(): 2020-12-19
# $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/, Inparanoid8
# $species: Oryza sativa_subsp._japonica, Oryza sativa_Japonica_Group, Oryza...
# $rdataclass: OrgDb, Inparanoid8Db
# additional mcols(): taxonomyid, genome, description,
#   coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
#   rdatapath, sourceurl, sourcetype 
# retrieve records with, e.g., 'object[["AH10561"]]' 

            title                                               
  AH10561 | hom.Oryza_sativa.inp8.sqlite                        
  AH85565 | org.Oryza_sativa_(japonica_cultivar-group).eg.sqlite
  AH85566 | org.Oryza_sativa_Japonica_Group.eg.sqlite           
  AH85567 | org.Oryza_sativa_subsp._japonica.eg.sqlite

It looks like there are three that could be of interested and utilized.

ADD REPLY
0
Entering edit mode
Bruno • 0
@d209d072
Last seen 15 months ago
France

Hi Johannes,

First, thank you so much for providing these annotations for the community. We feel often a bit left off in the plant community ^^ I have been trying to find a good Arabidopsis thaliana annotation for single cell ATAC seq analysis, and downloaded the Arabidopsis Annotation you created 4.5 years ago. But Unfortunately R display the error "Annotation must be a GRanges object" Despite trying to convert it using different program but did not succeed. Would you have a solution about that ?

Thank you very much in advance for your help,

Bruno

ADD COMMENT
1
Entering edit mode

You did not show the code that resulted in that error, so we have to guess what you did. Yet, the R package Signac contains the function GetGRangesFromEnsDb() (link). May be that is worth looking at? Or did you already do so and refered to it as 'different program'?

Note that Signac is not a Bioconductor package, and therefore questions on Signac are best asked on its own website. Yet, if the function works fine on another EnsDb, then it is (indeed) related to the Arabidopsis EnsDb.

ADD REPLY
0
Entering edit mode

Hi Guido,

Thanks for your fast reply the error was as said in my question "Annotation must be a GRanges object", when I tried to use the annotation for my dataset. I had tried makeGRangesFromEnsDb() without success but I missed the GetGRangesFromEnsDb(). It works using Johannes Annotation.

Thanks again

ADD REPLY
0
Entering edit mode

Hi Johannes,

I used with success your previous EnsDb annotation Arabidopsis thaliana but many new genes are not annotated on that previous version. Would it be possible to have a new EnsDb annotation for Arabidopsis thaliana ? The newest gene annotation is called Arabidopsis_thaliana.TAIR10.55 on Tair. Would you also be able to provide a tutorial on how to do it, so people stop bothering you ? I could not find anything that works online.

Thank you very much for your help,

ADD REPLY
1
Entering edit mode

Hi Bruno,

there is information available in the ensembldb vignette on how to build an EnsDb directly from the Ensembl MySQL database(s) - but it's not straight forward to get the Ensembl Perl API and required Perl version installed properly.

I've created the EnsDb for arabidopsis thaliana (genome build TAIR10) for Ensembl release 110 (the current version). You can download the sqlite file from here. Please let me know if that was not the version you were looking for.

Best, jo

ADD REPLY
0
Entering edit mode

Hi Jo,

It works like a charm. Thanks a lot for your help !! Bru

ADD REPLY
0
Entering edit mode
Shiva • 0
@e1ebefbf
Last seen 8 months ago
India

Hi Johannes,

Currently i am working on oryza sativa indica and not able to find the Ensemble annotation dbi. Is it possible to create ensembldb for oryza sativa indica. This is the url for genome https://plants.ensembl.org/Oryza_indica/Info/Index

ADD COMMENT
0
Entering edit mode

any particular Genome/Ensembl release version or is Ensembl plants 58 and genome version ASM465v1 OK?

ADD REPLY
0
Entering edit mode

I have now added the EnsDb for oryza indica (EnsDb.Oindica.v111.sqlite) to the shared folder with all custom made databases (here )

cheers, jo

ADD REPLY
0
Entering edit mode

Thank you so much , ASM1465v1 was fine.

May i know how to use it.

ADD REPLY

Login before adding your answer.

Traffic: 457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6