Entering edit mode
Hi,
I am trying to make a TxDb object from the human Gencode GTF file. However, I can't get it to add the gene names/symbols.
GTF contain gene_name:
##description: evidence-based annotation of the human genome (GRCh38), version 34 (Ensembl 100)
##provider: GENCODE
##contact: gencode-help@ebi.ac.uk
##format: gtf
##date: 2020-03-24
chr1 HAVANA gene 11869 14409 . + . gene_id "ENSG00000223972.5"; gene_type "transcribed_unprocessed_pseudogene"; gene_name "DDX11L1"; level 2; hgnc_id "HGNC:37102"; havana_gene "OTTHUMG00000000961.1";
Making the TxDb object:
gtf <- makeTxDbFromGFF('~/Downloads/gencode.v34.basic.annotation.gtf')
gtf doesn't have gene name available:
> columns(gtf)
[1] "CDSCHROM" "CDSEND" "CDSID" "CDSNAME" "CDSPHASE" "CDSSTART" "CDSSTRAND" "EXONCHROM" "EXONEND" "EXONID" "EXONNAME" "EXONRANK"
[13] "EXONSTART" "EXONSTRAND" "GENEID" "TXCHROM" "TXEND" "TXID" "TXNAME" "TXSTART" "TXSTRAND" "TXTYPE"
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.1
[4] purrr_0.3.4 readr_1.3.1 tidyr_1.1.1
[7] tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
[10] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.12.1 AnnotationFilter_1.12.0
[13] TxDb.Hsapiens.UCSC.hg38.knownGene_3.10.0 GenomicFeatures_1.40.1 AnnotationDbi_1.50.3
[16] Biobase_2.48.0 Gviz_1.32.0 GenomicRanges_1.40.0
[19] GenomeInfoDb_1.24.2 IRanges_2.22.2 S4Vectors_0.26.1
[22] BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] colorspace_1.4-1 ellipsis_0.3.1 biovizBase_1.36.0 htmlTable_2.0.1 XVector_0.28.0
[6] fs_1.5.0 base64enc_0.1-3 dichromat_2.0-0 rstudioapi_0.11 bit64_4.0.2
[11] fansi_0.4.1 lubridate_1.7.9 xml2_1.3.2 splines_4.0.2 R.methodsS3_1.8.0
[16] knitr_1.29 Formula_1.2-3 jsonlite_1.7.0 Rsamtools_2.4.0 broom_0.7.0
[21] cluster_2.1.0 dbplyr_1.4.4 png_0.1-7 R.oo_1.23.0 BiocManager_1.30.10
[26] compiler_4.0.2 httr_1.4.2 backports_1.1.8 assertthat_0.2.1 Matrix_1.2-18
[31] lazyeval_0.2.2 cli_2.0.2 acepack_1.4.1 htmltools_0.5.0 prettyunits_1.1.1
[36] tools_4.0.2 gtable_0.3.0 glue_1.4.1 GenomeInfoDbData_1.2.3 rappdirs_0.3.1
[41] tinytex_0.25 Rcpp_1.0.5 cellranger_1.1.0 styler_1.3.2 vctrs_0.3.2
[46] Biostrings_2.56.0 rtracklayer_1.48.0 xfun_0.16 rvest_0.3.6 lifecycle_0.2.0
[51] XML_3.99-0.5 zlibbioc_1.34.0 scales_1.1.1 BSgenome_1.56.0 VariantAnnotation_1.34.0
[56] hms_0.5.3 ProtGenerics_1.20.0 SummarizedExperiment_1.18.2 RMariaDB_1.0.9 RColorBrewer_1.1-2
[61] curl_4.3 memoise_1.1.0 gridExtra_2.3 biomaRt_2.44.1 rpart_4.1-15
[66] latticeExtra_0.6-29 stringi_1.4.6 RSQLite_2.2.0 checkmate_2.0.0 BiocParallel_1.22.0
[71] rlang_0.4.7 pkgconfig_2.0.3 matrixStats_0.56.0 bitops_1.0-6 lattice_0.20-41
[76] GenomicAlignments_1.24.0 htmlwidgets_1.5.1 bit_4.0.4 tidyselect_1.1.0 magrittr_1.5
[81] R6_2.4.1 generics_0.0.2 Hmisc_4.4-0 DelayedArray_0.14.1 DBI_1.1.0
[86] withr_2.2.0 pillar_1.4.6 haven_2.3.1 foreign_0.8-80 survival_3.2-3
[91] RCurl_1.98-1.2 nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 utf8_1.1.4
[96] BiocFileCache_1.12.1 jpeg_0.1-8.1 progress_1.2.2 readxl_1.3.1 data.table_1.13.0
[101] blob_1.2.1 reprex_0.3.0 digest_0.6.25 R.cache_0.14.0 R.utils_2.9.2
[106] openssl_1.4.2 munsell_0.5.0 askpass_1.1