biomaRt finding transcript lenghts
1
@7c8fd686
Last seen 2.2 years ago
Germany
Hello everyone,
i want to get transcript lengths of my ensembl ids, however this function gives me more than one values for each gene. I want to get the longest length for each gene.
For example for gene ENSG00000006210 i got 3285, 3313, 563 and 5796. How can take just the biggest value?
require(biomaRt)
ensembl=useEnsembl(biomart="genes",dataset="hsapiens_gene_ensembl") # dataset and database chosen.
datExpr=read.csv("../clustering genes/filtered_reduced_rc.csv",row.names=1) # datExpr file loaded
genes=rownames(datExpr)
x= getBM(attributes = c('transcript_length'), filters = 'ensembl_gene_id', values = genes, mart = ensembl)
biomaRt
• 3.3k views
@james-w-macdonald-5106
Last seen 8 hours ago
United States
Probably easier would be to use an EnsDb
. If you want naive lengths, you might just use the genes rather than transcripts.
> library(AnnotationHub)
> hub <- AnnotationHub()
> query(hub, c("homo sapiens","ensdb"))
AnnotationHub with 22 records
# snapshotDate(): 2022-04-25
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: EnsDb
# additional mcols(): taxonomyid, genome, description,
# coordinate_1_based, maintainer, rdatadateadded, preparerclass, tags,
# rdatapath, sourceurl, sourcetype
# retrieve records with, e.g., 'object[["AH53211"]]'
title
AH53211 | Ensembl 87 EnsDb for Homo Sapiens
AH53715 | Ensembl 88 EnsDb for Homo Sapiens
AH56681 | Ensembl 89 EnsDb for Homo Sapiens
AH57757 | Ensembl 90 EnsDb for Homo Sapiens
AH60773 | Ensembl 91 EnsDb for Homo Sapiens
... ...
AH89426 | Ensembl 103 EnsDb for Homo sapiens
AH95744 | Ensembl 104 EnsDb for Homo sapiens
AH98047 | Ensembl 105 EnsDb for Homo sapiens
AH100643 | Ensembl 106 EnsDb for Homo sapiens
AH104864 | Ensembl 107 EnsDb for Homo sapiens
## we'll use the latest one
> ensdb <- hub[["AH104864"]]
loading from cache
require("ensembldb")
##Genes
> gns <- genes(ensdb)
## Transcripts
> txs <- transcriptsBy(ensdb)
## max transcript widths
> txwid <- sapply(width(txs), max)
## gene widths, with names
> gnwid <- setNames(width(gns), names(gns))
## compare 20 of them
> data.frame(TxLen = head(txwid, 20), GeneLen = gnwid[names(head(txwid, 20))])
TxLen GeneLen
ENSG00000000003 9996 12884
ENSG00000000005 14950 14950
ENSG00000000419 24121 24274
ENSG00000000457 44266 44637
ENSG00000000460 191079 192074
ENSG00000000938 23122 23122
ENSG00000000971 100479 100723
ENSG00000001036 16909 16909
ENSG00000001084 94506 119630
ENSG00000001167 29430 29430
ENSG00000001460 59930 59936
ENSG00000001461 57175 57175
ENSG00000001497 96234 96262
ENSG00000001561 16700 16700
ENSG00000001617 33984 34031
ENSG00000001626 261556 428852
ENSG00000001629 155410 155410
ENSG00000001630 49517 49817
ENSG00000001631 47132 48669
ENSG00000002016 77509 78318
Login before adding your answer.
Traffic: 546 users visited in the last hour
Cross-posted: https://www.biostars.org/p/9537060/