Question about TranscriptDb and makeTranscriptDb method
1
0
Entering edit mode
Song Li ▴ 60
@song-li-4383
Last seen 10.2 years ago
Hi, All, I want to thank you for the incredible package which greatly simplifies our analysis for RNA-seq. However, I am working with Arabidopsis RNA-seq data, however, it seems that I have to build a transcriptDb object by myself. Is there a function that reads GTF file and make transcriptDB object? Thanks, Song Li -- Postdoctoral Associate Institute for Genome Sciences and Policy Duke University
TranscriptDb TranscriptDb • 1000 views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 21 hours ago
Seattle, WA, United States
Hi Song, On 12/10/2010 11:16 AM, Song Li wrote: > Hi, All, > > I want to thank you for the incredible package which greatly > simplifies our analysis for RNA-seq. > > However, I am working with Arabidopsis RNA-seq data, however, it seems > that I have to build a transcriptDb object by myself. Is there a > function that reads GTF file and make transcriptDB object? No we don't have this yet but we might add it in the future. In the mean time you can build a TranscriptDb object for Arabidopsis by using the alyrata_eg_gene dataset from the plant_mart_7 Mart: > library(GenomicFeatures) > txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene") Download and preprocess the 'transcripts' data frame ... OK Download and preprocess the 'splicings' data frame ... OK Download and preprocess the 'genes' data frame ... OK Prepare the 'metadata' data frame ... OK Make the TranscriptDb object ... OK Warning messages: 1: In .normargSplicings(splicings, unique_tx_ids) : no CDS information for this TranscriptDb object 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) : chromosome lengths and circularity flags are not available for this TranscriptDb object > txdb TranscriptDb object: | Db type: TranscriptDb | Data source: BioMart | BioMart database: plant_mart_7 | BioMart database version: ENSEMBL PLANT 7 (EBI UK) | BioMart dataset: alyrata_eg_gene | BioMart dataset description: Arabidopsis lyrata genes (Araly1) | BioMart dataset version: Araly1 | Full dataset: yes | transcript_nrow: 32667 | exon_nrow: 174271 | cds_nrow: 0 | Db created by: GenomicFeatures package from Bioconductor | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010) | GenomicFeatures version at creation time: 1.2.3 | RSQLite version at creation time: 0.9-4 | DBSCHEMAVERSION: 1.0 Just a reminder though that if you decide to use this then it's *crucial* that you align your RNA-seq data against the reference genome that corresponds to those annotations (I'm not sure which one it is, you'll need to investigate). Cheers, H. > > Thanks, > Song Li -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Hi Herv? , Thank you for the reply. I am little worried about the warning message that "CDS" is not available. However, it does not seem to be a crucial factor to consider at this moment. Best, Song 2010/12/11 Hervé Pagès <hpages at="" fhcrc.org="">: > Hi Song, > > On 12/10/2010 11:16 AM, Song Li wrote: >> >> Hi, All, >> >> I want to thank you for the incredible package which greatly >> simplifies our analysis for RNA-seq. >> >> However, I am working with Arabidopsis RNA-seq data, however, it seems >> that I have to build a transcriptDb object by myself. ?Is there a >> function that reads GTF file and make transcriptDB object? > > No we don't have this yet but we might add it in the future. > In the mean time you can build a TranscriptDb object for > Arabidopsis by using the alyrata_eg_gene dataset from the > plant_mart_7 Mart: > >> library(GenomicFeatures) > >> txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene") > Download and preprocess the 'transcripts' data frame ... OK > Download and preprocess the 'splicings' data frame ... OK > Download and preprocess the 'genes' data frame ... OK > Prepare the 'metadata' data frame ... OK > Make the TranscriptDb object ... OK > Warning messages: > 1: In .normargSplicings(splicings, unique_tx_ids) : > ?no CDS information for this TranscriptDb object > 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, > splicings$exon_chrom) : > ?chromosome lengths and circularity flags are not available for this > TranscriptDb object > >> txdb > TranscriptDb object: > | Db type: TranscriptDb > | Data source: BioMart > | BioMart database: plant_mart_7 > | BioMart database version: ENSEMBL PLANT 7 (EBI UK) > | BioMart dataset: alyrata_eg_gene > | BioMart dataset description: Arabidopsis lyrata genes (Araly1) > | BioMart dataset version: Araly1 > | Full dataset: yes > | transcript_nrow: 32667 > | exon_nrow: 174271 > | cds_nrow: 0 > | Db created by: GenomicFeatures package from Bioconductor > | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010) > | GenomicFeatures version at creation time: 1.2.3 > | RSQLite version at creation time: 0.9-4 > | DBSCHEMAVERSION: 1.0 > > Just a reminder though that if you decide to use this then it's > *crucial* that you align your RNA-seq data against the reference > genome that corresponds to those annotations (I'm not sure which > one it is, you'll need to investigate). > > Cheers, > H. > >> >> Thanks, >> Song Li > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: ?(206) 667-5791 > Fax: ? ?(206) 667-1319 > -- Postdoctoral Associate Institute for Genome Sciences and Policy Duke University
ADD REPLY
0
Entering edit mode
Hi Song, The message about CDS availability refers just to the ranges needed to populate the CDS tables. However, if you are like a lot of people you will only be asking questions about transcripts and exons, and in that case, I bet that this will not affect you. Marc On 12/13/2010 07:00 AM, Song Li wrote: > Hi Herv? , > > Thank you for the reply. > > I am little worried about the warning message that "CDS" is not > available. However, it does not seem to be a crucial factor to > consider at this moment. > > Best, > Song > > 2010/12/11 Hervé Pagès <hpages at="" fhcrc.org="">: > >> Hi Song, >> >> On 12/10/2010 11:16 AM, Song Li wrote: >> >>> Hi, All, >>> >>> I want to thank you for the incredible package which greatly >>> simplifies our analysis for RNA-seq. >>> >>> However, I am working with Arabidopsis RNA-seq data, however, it seems >>> that I have to build a transcriptDb object by myself. Is there a >>> function that reads GTF file and make transcriptDB object? >>> >> No we don't have this yet but we might add it in the future. >> In the mean time you can build a TranscriptDb object for >> Arabidopsis by using the alyrata_eg_gene dataset from the >> plant_mart_7 Mart: >> >> >>> library(GenomicFeatures) >>> >> >>> txdb <- makeTranscriptDbFromBiomart("plant_mart_7", "alyrata_eg_gene") >>> >> Download and preprocess the 'transcripts' data frame ... OK >> Download and preprocess the 'splicings' data frame ... OK >> Download and preprocess the 'genes' data frame ... OK >> Prepare the 'metadata' data frame ... OK >> Make the TranscriptDb object ... OK >> Warning messages: >> 1: In .normargSplicings(splicings, unique_tx_ids) : >> no CDS information for this TranscriptDb object >> 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, >> splicings$exon_chrom) : >> chromosome lengths and circularity flags are not available for this >> TranscriptDb object >> >> >>> txdb >>> >> TranscriptDb object: >> | Db type: TranscriptDb >> | Data source: BioMart >> | BioMart database: plant_mart_7 >> | BioMart database version: ENSEMBL PLANT 7 (EBI UK) >> | BioMart dataset: alyrata_eg_gene >> | BioMart dataset description: Arabidopsis lyrata genes (Araly1) >> | BioMart dataset version: Araly1 >> | Full dataset: yes >> | transcript_nrow: 32667 >> | exon_nrow: 174271 >> | cds_nrow: 0 >> | Db created by: GenomicFeatures package from Bioconductor >> | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010) >> | GenomicFeatures version at creation time: 1.2.3 >> | RSQLite version at creation time: 0.9-4 >> | DBSCHEMAVERSION: 1.0 >> >> Just a reminder though that if you decide to use this then it's >> *crucial* that you align your RNA-seq data against the reference >> genome that corresponds to those annotations (I'm not sure which >> one it is, you'll need to investigate). >> >> Cheers, >> H. >> >> >>> Thanks, >>> Song Li >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M2-B876 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> >> > > >
ADD REPLY

Login before adding your answer.

Traffic: 444 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6