Please note that some important tweaks were made to makeTxDbFromBiomart() last week to improve its support for EnsemblGenomes (see here A: Errors with makeTxDbFromBiomart for the details) so make sure you use the latest version of GenomicFeatures (1.28.5) before trying the above.
Ensemblgenomes provides gene models for many plants. Check http://plants.ensembl.org/index.html . You could either download a gtf or gff3 file for rice from there and build a TxDb using makeTxDbFromGff (GenomicFeatures package) or, since the data is in Ensembl format, an EnsDb using ensDbFromGtf (ensembldb package - EnsDb and TxDb packages/databases provide the same functionality/annotations).
For EnsDb, creating an EnsDb from a GTF you might lack some annotations since they are not provided in the file. If you tell me what release and species (which of the many oryza forms e.g. oryza_sativa, oryza_meridionalis etc) you'd need, I could build the EnsDb database/package for you directly from the ensemblgenomes MySQL databases - just let me know.
I would use the first one - or the second, which to my understanding contains only genes encoded on chromosomes (the other might contain also containing genes encoded in contigs).
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in c(x, value) :
could not find symbol "recursive" in environment of the generic function
Might be a problem in the makeTxDbFromGFF function from the GenomicFeatures package. It works with the ensDbFromGtf from the ensembldb package.
> library(ensembldb)
> dbf <- ensDbFromGtf("Oryza_sativa.IRGSP-1.0.37.gtf.gz")
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
Attribute availability:
o gene_id ... OK
o gene_name ... OK
o entrezid ... Nope
o gene_biotype ... OK
OK
Processing transcripts ...
Attribute availability:
o transcript_id ... OK
o gene_id ... OK
o transcript_biotype ... OK
OK
Processing exons ... OK
Processing chromosomes ... Fetch seqlengths from ensembl ... OK
Generating index ... OK
-------------
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
Warning messages:
1: call dbDisconnect() when finished working with a connection
2: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism, :
I'm missing column(s): 'entrezid'. The corresponding database column(s) will be empty!
3: closing unused connection 7 (ftp://ftp.ensemblgenomes.org/pub/release-37/plants/mysql/)
4: closing unused connection 6 (ftp://ftp.ensemblgenomes.org/pub/release-37/metazoa/mysql/)
5: closing unused connection 5 (ftp://ftp.ensemblgenomes.org/pub/release-37/fungi/mysql/)
6: closing unused connection 4 (ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/mysql/)
7: closing unused connection 3 (ftp://ftp.ensembl.org/pub/release-37/mysql/)
> edb <- EnsDb(dbf)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.0.1
|Creation time: Sat Nov 25 18:53:08 2017
|ensembl_version: 37
|ensembl_host: unknown
|Organism: Oryza_sativa
|genome_build: IRGSP-1.0
|DBSCHEMAVERSION: 1.0
|source_file: Oryza_sativa.IRGSP-1.0.37.gtf.gz
| No. of genes: 91992.
| No. of transcripts: 98663.
>
Hello,
This worked! thank you :) If you post your answer as a top level comment I can accept it.
Also
library(GenomicFeatures)
should be there formakeTxDbFromBiomart()
to work.Done. I added the
library(GenomicFeatures)
line. Thanks for the feedback!Cheers,
H.