Get wrong tx_type when using GenomicFeatures::makeTxDbFromGTF
0
0
Entering edit mode
@karolin-wiedemann-9303
Last seen 9.0 years ago
Germany

The package GenomicFeatures (>v1.20) provides the "tx_type" column in the transcript table of TranscriptDBs.
I want to read a GTF file, that includes the transcript_biotype. As example, I downloaded and unziped an GTF from Ensembl: ftp://ftp.ensembl.org/pub/release-82/gtf/homo_sapiens/Homo_sapiens.GRCh38.82.gtf.gz .
Here an extract:
1       havana  transcript      11869   14409   .       +       .       gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; havana_gene "OTTHUMG00000000961"; havana_gene_version "2"; transcript_name "DDX11L1-002"; transcript_source "havana"; transcript_biotype "processed_transcript"; havana_transcript "OTTHUMT00000362751"; havana_transcript_version "1"; tag "basic"; transcript_support_level "1";

However, I don't get the a tx_type like mRNA, snoRNA,... .  Instead the tx_type column is filled with the word "transcript".

My example:

> txdb <- GenomicFeatures::makeTxDbFromGFF("~/data/Homo_sapiens.GRCh38.82.gtf",format="gtf")
> tx <- GenomicFeatures::transcripts(txdb,column=c("tx_name","tx_type"))
> head(tx)
GRanges object with 6 ranges and 2 metadata columns:
      seqnames         ranges strand |         tx_name     tx_type
         <Rle>      <IRanges>  <Rle> |     <character> <character>
  [1]        1 [11869, 14409]      + | ENST00000456328  transcript
  [2]        1 [12010, 13670]      + | ENST00000450305  transcript
  [3]        1 [29554, 31097]      + | ENST00000473358  transcript
  [4]        1 [30267, 31109]      + | ENST00000469289  transcript
  [5]        1 [30366, 30503]      + | ENST00000607096  transcript
  [6]        1 [52473, 53312]      + | ENST00000606857  transcript
  -------
  seqinfo: 59 sequences (1 circular) from an unspecified genome; no seqlengths

 

Looking at the code:

rtracklayer::import is used to read the GTF, while only the columns "type","gene_id","transcript_id" and "exon_id" are returned. Thereby "type" describes the 3.column in the GTF. Maybe I am wrong, but this column never includes transcript_type information.

 

My questions:
1) Is there something wrong in the way I make TxDbs from GTF or did I understand the tx_type incorrectly?

2) Why are only a predefined tx_types excapted ?

 

 

Thanks, Karolin

__________

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.3 LTS

locale:
 [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C               LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8    LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.32.0       XVector_0.10.0             GenomicRanges_1.22.1       BiocGenerics_0.16.1       
 [5] zlibbioc_1.16.0            GenomicAlignments_1.6.1    IRanges_2.4.4              BiocParallel_1.4.0        
 [9] GenomeInfoDb_1.6.1         tools_3.2.2                SummarizedExperiment_1.0.1 parallel_3.2.2            
[13] Biobase_2.30.0             DBI_0.3.1                  lambda.r_1.1.7             futile.logger_1.4.1       
[17] rtracklayer_1.30.1         S4Vectors_0.8.3            futile.options_1.0.0       bitops_1.0-6              
[21] RCurl_1.95-4.7             biomaRt_2.26.1             RSQLite_1.0.0              GenomicFeatures_1.22.5    
[25] Biostrings_2.38.2          Rsamtools_1.22.0           stats4_3.2.2               XML_3.98-1.3

 

genomicfeatures maketxdbfromgff tx_type gtf • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6