How to create tx2gene data.frame when there's no TxDb object for the organism you are working with.
2
0
Entering edit mode
prab4th • 0
@prab4th-14026
Last seen 2.3 years ago
United States

I have been following the workflow available at [Importing transcript abundance datasets with tximport](http://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html) and it is required to use a TxDb object. I am working with Rice and there isn't a TxDb object for rice. But rice has a BSgenome object.

Is there any way I can use the BSgenome object? I just want to use my Salmon output to be used in DESeq2.

tximport deseq2 annotationdbi • 4.2k views
ADD COMMENT
4
Entering edit mode
@herve-pages-1542
Last seen 1 day ago
Seattle, WA, United States

Hi,

Alternatively, you can use makeTxDbFromBiomart() to make a TxDb object from the Ensembl Plants mart:

library(biomaRt)
mart <- useMart(biomart="plants_mart", host="plants.ensembl.org")
datasets <- listDatasets(mart)
datasets[1:6 , 1:2]
#                dataset                               description
# 1    atauschii_eg_gene              Aegilops tauschii genes (...
# 2 obrachyantha_eg_gene Oryza brachyantha genes (Oryza_brachya...
# 3 ptrichocarpa_eg_gene                Populus trichocarpa gen...
# 4     ppersica_eg_gene                   Prunus persica genes...
# 5   stuberosum_eg_gene              Solanum tuberosum genes (...
# 6     sitalica_eg_gene                   Setaria italica gene...
idx <- grep("oryza", datasets$description, ignore.case=TRUE)
datasets[idx, 1:2]
#                    dataset                           description
# 2     obrachyantha_eg_gene  Oryza brachyantha genes (Oryza_br...
# 8          onivara_eg_gene                  Oryza nivara gene...
# 14       opunctata_eg_gene                Oryza punctata gene...
# 15         oindica_eg_gene               Oryza sativa Indica ...
# 18   oglumaepatula_eg_gene            Oryza glumaepatula gene...
# 19        obarthii_eg_gene                 Oryza barthii gene...
# 20         osativa_eg_gene            Oryza sativa Japonica g...
# 25   omeridionalis_eg_gene Oryza meridionalis genes (Oryza_me...
# 28      orufipogon_eg_gene                   Oryza rufipogon ...
# 38 olongistaminata_eg_gene Oryza longistaminata genes (O_long...
# 41     oglaberrima_eg_gene                    Oryza glaberrim...

Choose your dataset of interest (e.g. osativa_eg_gene), then:

library(GenomicFeatures)
txdb <- makeTxDbFromBiomart(biomart="plants_mart",
                            dataset="osativa_eg_gene",
                            host="plants.ensembl.org")

Please note that some important tweaks were made to makeTxDbFromBiomart() last week to improve its support for EnsemblGenomes (see here A: Errors with makeTxDbFromBiomart for the details) so make sure you use the latest version of GenomicFeatures (1.28.5) before trying the above.

Cheers,

H.

ADD COMMENT
0
Entering edit mode

Hello,

This worked! thank you :) If you post your answer as a top level comment I can accept it.

 

Alsolibrary(GenomicFeatures) should be there for makeTxDbFromBiomart() to work.

ADD REPLY
0
Entering edit mode

Done. I added the library(GenomicFeatures) line. Thanks for the feedback!

Cheers,

H.

ADD REPLY
2
Entering edit mode
Johannes Rainer ★ 2.1k
@johannes-rainer-6987
Last seen 15 days ago
Italy

Ensemblgenomes provides gene models for many plants. Check http://plants.ensembl.org/index.html . You could either download a gtf or gff3 file for rice from there and build a TxDb using makeTxDbFromGff (GenomicFeatures package) or, since the data is in Ensembl format, an EnsDb using ensDbFromGtf (ensembldb package - EnsDb and TxDb packages/databases provide the same functionality/annotations).

For EnsDb, creating an EnsDb from a GTF you might lack some annotations since they are not provided in the file. If you tell me what release and species (which of the many oryza forms e.g. oryza_sativa, oryza_meridionalis etc) you'd need, I could build the EnsDb database/package for you directly from the ensemblgenomes MySQL databases - just let me know.

cheers, jo

ADD COMMENT
0
Entering edit mode

I'll try the `makeTxDbFromGff` first and get back to you if l couldn't get it to work. Thanks Jo
 

ADD REPLY
0
Entering edit mode

These were the files availble for Oryza sativa: ftp://ftp.ensemblgenomes.org/pub/plants/release-37/gff3/oryza_sativa/

File: Oryza_sativa.IRGSP-1.0.37.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chr.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.abinitio.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.1.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.3.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.2.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.4.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.6.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.5.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.7.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.8.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.11.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.12.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.9.gff3.gz
File: Oryza_sativa.IRGSP-1.0.37.chromosome.10.gff3.gz

Should I use the first file or should I combine each chromosome files in some way before feeding it into makeTxDbfromGFF?

ADD REPLY
1
Entering edit mode

I would use the first one - or the second, which to my understanding contains only genes encoded on chromosomes (the other might contain also containing genes encoded in contigs).

ADD REPLY
0
Entering edit mode

Hey,

Hey, I'm using deseq2 after kallisto to analyze rice data. I'm using an ensembl gtf and I want to create a txdb. I used this function: 

txdb2 <- makeTxDbFromGFF(file="C:/Users/Dee/Desktop/Thesis_rice/Oryza_sativa.IRGSP-1.0.37.gtf", dataSource=paste("ftp://ftp.ensemblgenomes.org/pub/plants/release-37/gtf/oryza_sativa/",sep=""), organism="Oryza sativa")

and I got that error:

Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... Error in c(x, value) : 
  could not find symbol "recursive" in environment of the generic function

any help?
 

 

ADD REPLY
1
Entering edit mode

Might be a problem in the makeTxDbFromGFF function from the GenomicFeatures package. It works with the ensDbFromGtf from the ensembldb package.

> library(ensembldb)
> dbf <- ensDbFromGtf("Oryza_sativa.IRGSP-1.0.37.gtf.gz")
Importing GTF file ... OK
Processing metadata ... OK
Processing genes ...
 Attribute availability:
  o gene_id ... OK
  o gene_name ... OK
  o entrezid ... Nope
  o gene_biotype ... OK
OK
Processing transcripts ...
 Attribute availability:
  o transcript_id ... OK
  o gene_id ... OK
  o transcript_biotype ... OK
OK
Processing exons ... OK
Processing chromosomes ... Fetch seqlengths from ensembl ... OK
Generating index ... OK
  -------------
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
Warning messages:
1: call dbDisconnect() when finished working with a connection
2: In ensDbFromGRanges(GTF, outfile = outfile, path = path, organism = organism,  :
   I'm missing column(s): 'entrezid'. The corresponding database column(s) will be empty!
3: closing unused connection 7 (ftp://ftp.ensemblgenomes.org/pub/release-37/plants/mysql/)
4: closing unused connection 6 (ftp://ftp.ensemblgenomes.org/pub/release-37/metazoa/mysql/)
5: closing unused connection 5 (ftp://ftp.ensemblgenomes.org/pub/release-37/fungi/mysql/)
6: closing unused connection 4 (ftp://ftp.ensemblgenomes.org/pub/release-37/bacteria/mysql/)
7: closing unused connection 3 (ftp://ftp.ensembl.org/pub/release-37/mysql/)
> edb <- EnsDb(dbf)
> edb
EnsDb for Ensembl:
|Backend: SQLite
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.0.1
|Creation time: Sat Nov 25 18:53:08 2017
|ensembl_version: 37
|ensembl_host: unknown
|Organism: Oryza_sativa
|genome_build: IRGSP-1.0
|DBSCHEMAVERSION: 1.0
|source_file: Oryza_sativa.IRGSP-1.0.37.gtf.gz
| No. of genes: 91992.
| No. of transcripts: 98663.
>

cheers, jo

ADD REPLY
0
Entering edit mode

Thanks alot!!

ADD REPLY
0
Entering edit mode

Hi,

is there any reason you used kallisto over Salmon?

ADD REPLY
0
Entering edit mode

I'm using both for comparison.

cheers,

Dina

ADD REPLY

Login before adding your answer.

Traffic: 374 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6