How can I get the gene_biotypes for a given gene symbol in R please? Preferably using Ensembl (which in turn is using Havana annotation).
Thank you.
How can I get the gene_biotypes for a given gene symbol in R please? Preferably using Ensembl (which in turn is using Havana annotation).
Thank you.
You can get that information using ensembldb
:
Assuming you're working with human annotations and want to use Ensembl release 86:
library(EnsDb.Hsapiens.v86)
edb <- EnsDb.Hsapiens.v86
## Get the transcript biotype for the gene SMC4
genes(edb, filter = ~ symbol == "SMC4", return.type = "DataFrame")
DataFrame with 1 row and 10 columns
gene_id gene_name gene_biotype gene_seq_start gene_seq_end
<character> <character> <character> <integer> <integer>
1 ENSG00000113810 SMC4 protein_coding 160399274 160434962
seq_name seq_strand seq_coord_system symbol entrezid
<character> <integer> <character> <character> <list>
1 3 1 chromosome SMC4 10051
With return.type
you can specify what return object the function should return (data.frame
, DataFrame
or the default GRanges
). You could also define columns = "gene_biotype"
to just return the biotype, gene ID and symbol:
genes(edb, filter = ~ symbol == "SMC4", return.type = "DataFrame", columns = "gene_biotype")
DataFrame with 1 row and 3 columns
gene_biotype gene_id symbol
<character> <character> <character>
1 protein_coding ENSG00000113810 SMC4
For other Ensembl releases and species you can get the respective EnsDb
database from AnnotationHub
(see e.g. ensembldb
vignette for details).
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.