Hi, All,
I want to thank you for the incredible package which greatly
simplifies our analysis for RNA-seq.
However, I am working with Arabidopsis RNA-seq data, however, it seems
that I have to build a transcriptDb object by myself. Is there a
function that reads GTF file and make transcriptDB object?
Thanks,
Song Li
--
Postdoctoral Associate
Institute for Genome Sciences and Policy
Duke University
Hi Song,
On 12/10/2010 11:16 AM, Song Li wrote:
> Hi, All,
>
> I want to thank you for the incredible package which greatly
> simplifies our analysis for RNA-seq.
>
> However, I am working with Arabidopsis RNA-seq data, however, it
seems
> that I have to build a transcriptDb object by myself. Is there a
> function that reads GTF file and make transcriptDB object?
No we don't have this yet but we might add it in the future.
In the mean time you can build a TranscriptDb object for
Arabidopsis by using the alyrata_eg_gene dataset from the
plant_mart_7 Mart:
> library(GenomicFeatures)
> txdb <- makeTranscriptDbFromBiomart("plant_mart_7",
"alyrata_eg_gene")
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
Warning messages:
1: In .normargSplicings(splicings, unique_tx_ids) :
no CDS information for this TranscriptDb object
2: In .normargChrominfo(chrominfo, transcripts$tx_chrom,
splicings$exon_chrom) :
chromosome lengths and circularity flags are not available for this
TranscriptDb object
> txdb
TranscriptDb object:
| Db type: TranscriptDb
| Data source: BioMart
| BioMart database: plant_mart_7
| BioMart database version: ENSEMBL PLANT 7 (EBI UK)
| BioMart dataset: alyrata_eg_gene
| BioMart dataset description: Arabidopsis lyrata genes (Araly1)
| BioMart dataset version: Araly1
| Full dataset: yes
| transcript_nrow: 32667
| exon_nrow: 174271
| cds_nrow: 0
| Db created by: GenomicFeatures package from Bioconductor
| Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010)
| GenomicFeatures version at creation time: 1.2.3
| RSQLite version at creation time: 0.9-4
| DBSCHEMAVERSION: 1.0
Just a reminder though that if you decide to use this then it's
*crucial* that you align your RNA-seq data against the reference
genome that corresponds to those annotations (I'm not sure which
one it is, you'll need to investigate).
Cheers,
H.
>
> Thanks,
> Song Li
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
Hi Herv? ,
Thank you for the reply.
I am little worried about the warning message that "CDS" is not
available. However, it does not seem to be a crucial factor to
consider at this moment.
Best,
Song
2010/12/11 Hervé Pagès <hpages at="" fhcrc.org="">:
> Hi Song,
>
> On 12/10/2010 11:16 AM, Song Li wrote:
>>
>> Hi, All,
>>
>> I want to thank you for the incredible package which greatly
>> simplifies our analysis for RNA-seq.
>>
>> However, I am working with Arabidopsis RNA-seq data, however, it
seems
>> that I have to build a transcriptDb object by myself. ?Is there a
>> function that reads GTF file and make transcriptDB object?
>
> No we don't have this yet but we might add it in the future.
> In the mean time you can build a TranscriptDb object for
> Arabidopsis by using the alyrata_eg_gene dataset from the
> plant_mart_7 Mart:
>
>> library(GenomicFeatures)
>
>> txdb <- makeTranscriptDbFromBiomart("plant_mart_7",
"alyrata_eg_gene")
> Download and preprocess the 'transcripts' data frame ... OK
> Download and preprocess the 'splicings' data frame ... OK
> Download and preprocess the 'genes' data frame ... OK
> Prepare the 'metadata' data frame ... OK
> Make the TranscriptDb object ... OK
> Warning messages:
> 1: In .normargSplicings(splicings, unique_tx_ids) :
> ?no CDS information for this TranscriptDb object
> 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom,
> splicings$exon_chrom) :
> ?chromosome lengths and circularity flags are not available for this
> TranscriptDb object
>
>> txdb
> TranscriptDb object:
> | Db type: TranscriptDb
> | Data source: BioMart
> | BioMart database: plant_mart_7
> | BioMart database version: ENSEMBL PLANT 7 (EBI UK)
> | BioMart dataset: alyrata_eg_gene
> | BioMart dataset description: Arabidopsis lyrata genes (Araly1)
> | BioMart dataset version: Araly1
> | Full dataset: yes
> | transcript_nrow: 32667
> | exon_nrow: 174271
> | cds_nrow: 0
> | Db created by: GenomicFeatures package from Bioconductor
> | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010)
> | GenomicFeatures version at creation time: 1.2.3
> | RSQLite version at creation time: 0.9-4
> | DBSCHEMAVERSION: 1.0
>
> Just a reminder though that if you decide to use this then it's
> *crucial* that you align your RNA-seq data against the reference
> genome that corresponds to those annotations (I'm not sure which
> one it is, you'll need to investigate).
>
> Cheers,
> H.
>
>>
>> Thanks,
>> Song Li
>
>
> --
> Hervé Pagès
>
> Program in Computational Biology
> Division of Public Health Sciences
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, M2-B876
> P.O. Box 19024
> Seattle, WA 98109-1024
>
> E-mail: hpages at fhcrc.org
> Phone: ?(206) 667-5791
> Fax: ? ?(206) 667-1319
>
--
Postdoctoral Associate
Institute for Genome Sciences and Policy
Duke University
Hi Song,
The message about CDS availability refers just to the ranges needed to
populate the CDS tables. However, if you are like a lot of people you
will only be asking questions about transcripts and exons, and in that
case, I bet that this will not affect you.
Marc
On 12/13/2010 07:00 AM, Song Li wrote:
> Hi Herv? ,
>
> Thank you for the reply.
>
> I am little worried about the warning message that "CDS" is not
> available. However, it does not seem to be a crucial factor to
> consider at this moment.
>
> Best,
> Song
>
> 2010/12/11 Hervé Pagès <hpages at="" fhcrc.org="">:
>
>> Hi Song,
>>
>> On 12/10/2010 11:16 AM, Song Li wrote:
>>
>>> Hi, All,
>>>
>>> I want to thank you for the incredible package which greatly
>>> simplifies our analysis for RNA-seq.
>>>
>>> However, I am working with Arabidopsis RNA-seq data, however, it
seems
>>> that I have to build a transcriptDb object by myself. Is there a
>>> function that reads GTF file and make transcriptDB object?
>>>
>> No we don't have this yet but we might add it in the future.
>> In the mean time you can build a TranscriptDb object for
>> Arabidopsis by using the alyrata_eg_gene dataset from the
>> plant_mart_7 Mart:
>>
>>
>>> library(GenomicFeatures)
>>>
>>
>>> txdb <- makeTranscriptDbFromBiomart("plant_mart_7",
"alyrata_eg_gene")
>>>
>> Download and preprocess the 'transcripts' data frame ... OK
>> Download and preprocess the 'splicings' data frame ... OK
>> Download and preprocess the 'genes' data frame ... OK
>> Prepare the 'metadata' data frame ... OK
>> Make the TranscriptDb object ... OK
>> Warning messages:
>> 1: In .normargSplicings(splicings, unique_tx_ids) :
>> no CDS information for this TranscriptDb object
>> 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom,
>> splicings$exon_chrom) :
>> chromosome lengths and circularity flags are not available for
this
>> TranscriptDb object
>>
>>
>>> txdb
>>>
>> TranscriptDb object:
>> | Db type: TranscriptDb
>> | Data source: BioMart
>> | BioMart database: plant_mart_7
>> | BioMart database version: ENSEMBL PLANT 7 (EBI UK)
>> | BioMart dataset: alyrata_eg_gene
>> | BioMart dataset description: Arabidopsis lyrata genes (Araly1)
>> | BioMart dataset version: Araly1
>> | Full dataset: yes
>> | transcript_nrow: 32667
>> | exon_nrow: 174271
>> | cds_nrow: 0
>> | Db created by: GenomicFeatures package from Bioconductor
>> | Creation time: 2010-12-11 17:43:13 -0800 (Sat, 11 Dec 2010)
>> | GenomicFeatures version at creation time: 1.2.3
>> | RSQLite version at creation time: 0.9-4
>> | DBSCHEMAVERSION: 1.0
>>
>> Just a reminder though that if you decide to use this then it's
>> *crucial* that you align your RNA-seq data against the reference
>> genome that corresponds to those annotations (I'm not sure which
>> one it is, you'll need to investigate).
>>
>> Cheers,
>> H.
>>
>>
>>> Thanks,
>>> Song Li
>>>
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M2-B876
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone: (206) 667-5791
>> Fax: (206) 667-1319
>>
>>
>
>
>