Hi,
I am writing a workflow to extract Ensembl transcripts using the getBM facility.
This is a portion of the code.
library(biomaRt) library(AnnotationHub) library(TxDb.Hsapiens.UCSC.hg19.knownGene) rm(list=ls()) ensembl_mart <- useMart(biomart="ENSEMBL_MART_ENSEMBL", host="grch37.ensembl.org") organism <- useDataset(dataset = "hsapiens_gene_ensembl", mart = ensembl_mart) txdb <- makeTxDbFromBiomart(dataset = "hsapiens_gene_ensembl") trs <- transcripts(txdb) genes <- getBM(c("chromosome_name","ensembl_gene_id","ensembl_transcript_id","transcript_start","transcript_end","hgnc_symbol"), filters = "biotype", values = c("protein_coding"), mart = organism) genes[genes$ensembl_transcript_id=="ENST00000226218",]
The results are as shown below:
chromosome_name ensembl_gene_id ensembl_transcript_id transcript_start transcript_end hgnc_symbol
1 17 ENSG00000109072 ENST00000226218 26694297 26697843 VTN
2 17 ENSG00000109072 ENST00000226218 26694297 26697843 SEBOX
Now if you were to obtain the same results using the grch37 ensembl biomart (I've provided the URL here ... You just need to click "Results"). The results are as shown below:
Gene stable ID Transcript stable ID Transcript start (bp) Transcript end (bp) Gene name
ENSG00000109072 ENST00000226218 26694297 26697843 VTN
This leads me to think that either getBM is doing something really odd or I'm making a mistake somewhere.
Would appreciate if someone could shed some light on this.
Thanks,
Himanshu