Hey,
You can search the biomaRt (Ensembl) datasets like this:
require(biomaRt)
listDatasets(useMart('ensembl'))
Set-up
There seems to be 3 Cricetulus griseus (C. griseus) datasets - I'll just use the first one:
datasets <- listDatasets(useMart('ensembl'))
datasets[grep('Chinese', datasets[,2]),]
dataset description
27 cgchok1gshd_gene_ensembl Chinese hamster CHOK1GS genes (CHOK1GS_HDv1)
28 cgcrigri_gene_ensembl Chinese hamster CriGri genes (CriGri_1.0)
30 cgpicr_gene_ensembl Chinese hamster PICR genes (CriGri-PICR)
151 psinensis_gene_ensembl Chinese softshell turtle genes (PelSin_1.0)
version
27 CHOK1GS_HDv1
28 CriGri_1.0
30 CriGri-PICR
151 PelSin_1.0
hamster <- useMart('ensembl', dataset = 'cgchok1gshd_gene_ensembl')
mouse <- useMart('ensembl', dataset = 'mmusculus_gene_ensembl')
Create a Chinese Hamster (C. griseus) lookup table
For C. griseus, the standard headers (attributes) seem to be used:
table <- getBM(
attributes = c('ensembl_gene_id','external_gene_name'),
mart = hamster)
head(table[table$external_gene_name != '',], 30)
ensembl_gene_id external_gene_name
6 ENSCGRG00001000006 ND1
9 ENSCGRG00001000009 mt-Tm
10 ENSCGRG00001000010 ND2
16 ENSCGRG00001000016 COX1
19 ENSCGRG00001000019 COX2
21 ENSCGRG00001000021 ATP8
22 ENSCGRG00001000022 ATP6
23 ENSCGRG00001000023 COX3
25 ENSCGRG00001000025 ND3
27 ENSCGRG00001000027 ND4L
28 ENSCGRG00001000028 ND4
32 ENSCGRG00001000032 ND5
33 ENSCGRG00001000033 ND6
35 ENSCGRG00001000035 CYTB
41 ENSCGRG00001000041 Elp6
44 ENSCGRG00001000044 Zfp449
46 ENSCGRG00001000046 Utp6
47 ENSCGRG00001000047 Ccng2
48 ENSCGRG00001000048 Tespa1
49 ENSCGRG00001000049 Tcea2
50 ENSCGRG00001000050 Rad21
51 ENSCGRG00001000051 Ednrb
52 ENSCGRG00001000052 Tmem98
53 ENSCGRG00001000053 Prok1
54 ENSCGRG00001000054 Emilin3
55 ENSCGRG00001000055 Dna2
57 ENSCGRG00001000057 Lpp
59 ENSCGRG00001000059 Brcc3
61 ENSCGRG00001000061 Rmnd5a
62 ENSCGRG00001000062 Gfy
Now map between Chinese Hamster (C. griseus) and Mouse (M. musculus)
So, now we can map to mouse:
getLDS(
mart = hamster,
attributes = c('ensembl_gene_id','external_gene_name','chromosome_name'),
martL = mouse,
attributesL = c('mgi_symbol','ensembl_gene_id','chromosome_name','gene_biotype'),
filters = 'external_gene_name',
values = c('COX1', 'COX2','Rad21','Dna2','Brcc3'))
Gene.stable.ID Gene.name Chromosome.scaffold.name MGI.symbol
1 ENSCGRG00001000016 COX1 MT mt-Co1
2 ENSCGRG00001000019 COX2 MT mt-Co2
3 ENSCGRG00001000055 Dna2 scaffold_33 Dna2
4 ENSCGRG00001000050 Rad21 scaffold_34 Rad21
5 ENSCGRG00001000059 Brcc3 scaffold_11 Brcc3
Gene.stable.ID.1 Chromosome.scaffold.name.1 Gene.type
1 ENSMUSG00000064351 MT protein_coding
2 ENSMUSG00000064354 MT protein_coding
3 ENSMUSG00000036875 10 protein_coding
4 ENSMUSG00000022314 15 protein_coding
5 ENSMUSG00000031201 X protein_coding
Note the other solution via Orthology.eg.db
, mentioned by James: biomart getLDS giving errors.
Keviin
Is using gene names preferable to looking up orthologues?
Good question