Hello,
In R, I previously used this piece of code to look up Ensembl IDs for lists of genes beginning with ENSG000... . In this example, my_df is a dataframe where the rownames are the gene IDs 9e.g. ENSG...):
my_df$ensembl <- sapply( strsplit( rownames(my_df), split="\\+" ), "[", 1 ) ensembl = useMart("ENSEMBL_MART_ENSEMBL",dataset="hsapiens_gene_ensembl", host="www.ensembl.org") # reflects recent change to hosting, as discussed in https://support.bioconductor.org/p/74322/ genemap <- getBM( attributes = c("ensembl_gene_id", "entrezgene", "hgnc_symbol"), filters = "ensembl_gene_id", values = my_df$ensembl, mart = ensembl ) idx <- match( my_df$ensembl, genemap$ensembl_gene_id ) my_df$entrez <- genemap$entrezgene[ idx ] my_df$hgnc_symbol <- genemap$hgnc_symbol[ idx ]
I'd now like to use this on a dataframe where the input row names are transcript IDs (e.g. ENST000...). I'm not sure whether I can do this with BioMart - does anyone know?
The Ensembl mart also provides transcript IDs (via the
ensembl_transcript_id
attribute) so I don't see why you couldn't do the same with transcript IDs instead of gene IDs. UselistAttributes()
to list all the attributes available for your dataset.H.