Dear all,
given a GTF file (for example, gencode.v28.basic.annotation.gtf), what is the simplest way to extract a table with the following information :
-- gene_name
-- gene_id
-- transcript_id
many thanks !
bogdan
Dear all,
given a GTF file (for example, gencode.v28.basic.annotation.gtf), what is the simplest way to extract a table with the following information :
-- gene_name
-- gene_id
-- transcript_id
many thanks !
bogdan
If you can use an Ensembl GTF, one easy and fast way is to use the refGenome package
library(refGenome)
gtf = ensemblGenome()
read.gtf(gtf, filename="Homo_sapiens.GRCh38.93.gtf")
genes = gtf@ev$gtf[ ,c("gene_name","gene_id","transcript_id")]
Another option with plyranges
library(plyranges)
gr <- read_gff("your_file.gtf") %>% select(gene_id, gene_name, transcript_id)
I don't see a read_gtf
in plyranges, in either release or devel?
Anyway, this is just a two-liner using basic rtracklayer/GenomicRanges functions.
> library(rtracklayer) > z <- import("ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.basic.annotation.gtf.gz") > mcols(z)[,c("gene_id","gene_name","transcript_id")] DataFrame with 1684537 rows and 3 columns gene_id gene_name transcript_id <character> <character> <character> 1 ENSG00000223972.5 DDX11L1 NA 2 ENSG00000223972.5 DDX11L1 ENST00000456328.2 3 ENSG00000223972.5 DDX11L1 ENST00000456328.2 4 ENSG00000223972.5 DDX11L1 ENST00000456328.2 5 ENSG00000223972.5 DDX11L1 ENST00000456328.2 ... ... ... ... 1684533 ENSG00000210195.2 MT-TT ENST00000387460.2 1684534 ENSG00000210195.2 MT-TT ENST00000387460.2 1684535 ENSG00000210196.2 MT-TP NA 1684536 ENSG00000210196.2 MT-TP ENST00000387461.2 1684537 ENSG00000210196.2 MT-TP ENST00000387461.2
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
thank you Jaro. I wish it works. On my Ubuntu system, by using a GTF file from STAR aligner website, it says :
It needs to be either Ensembl or UCSC (you'd use it with
gtf=ucscGenome()
), that's the limitation. What exactly is the GTF file from the STAR website you describe? Can you post a link to it?Thank you Jaro. The links are :
http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/ENSEMBL/homo_sapiens/ENSEMBL.homo_sapiens.release-83/
the file is : Homo_sapiens.GRCh38.83.gtf
http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STARgenomes/GENCODE/GRCh38_Gencode26/
the file is : gencode.v26.primary_assembly.annotation.gtf
During the last analysis, where 've mentioned the errors, the GTF files that 've used were from GENCODE:
https://www.gencodegenes.org/releases/current.html