Entering edit mode
Natasha
▴
440
@natasha-4640
Last seen 10.2 years ago
Dear List,
I want to extract ensembl gene ids from biomart to add to my
microarray analysis output. However, there are some discrepancies that
have me confused regarding the entrez gene id and ensemble gene id.
Array used: Illumina HumanHT12 v4.
As an example: GAGE12F, GAGE12G, GAGE12I genes
Microarray: Illumina HT12 v4 output:
Entrez_Gene_ID Symbol Chromosome Probe_Id Probe_Type Cytoband
26748 GAGE12I X ILMN_1691563 A Xp11.23b
100008586 GAGE12F X ILMN_3242920 S Xp11.23b
645073 GAGE12G X ILMN_1664660 S Xp11.23b
Definition
Homo sapiens G antigen 12I (GAGE12I), mRNA.
Homo sapiens G antigen 12F (GAGE12F), mRNA.
Homo sapiens G antigen 12G (GAGE12G), mRNA.
Biomart output:
entrezgene ensembl_gene_id hgnc_symbol
1 100008586 ENSG00000241465 GAGE12I
2 100008586 ENSG00000236362 GAGE12F
3 100008586 ENSG00000215269 GAGE12G
1022 26748 ENSG00000241465 GAGE12I
1023 26748 ENSG00000236362 GAGE12F
1024 26748 ENSG00000215269 GAGE12G
2392 645073 ENSG00000241465 GAGE12I
2393 645073 ENSG00000236362 GAGE12F
2394 645073 ENSG00000215269 GAGE12G
So please help me understand, why are there multiple results rather
than true unique results. If I merge the two, based on the above, I
would get an incorrectly merged file. (I cannot use the Illumina HT12
probe ids as a filter, as I was informed that in biomart these are
mapped to the HT12 v3 chip).
R code and sessionifno:
library(biomaRt)
library(DESeq)
library(gdata)
m_h_a2 # 3995 15 (limma output for a given comparison)
length(unique(m_h_a2$Entrez_Gene_ID)) # 3506
length(unique(m_h_a2$Symbol)) # 3542
length(unique(m_h_a2$Probe_Id)) # 3995
## Non-NA's
mh.ona = na.omit(m_h_a2) # 3912 17
## Unique ids
mh.u.eg =
m_h_a2[match(unique(m_h_a2$Entrez_Gene_ID),m_h_a2$Entrez_Gene_ID),] #
3506 15
mh.u.eg = na.omitmh.u.eg) # 3505 15
ensembl = useMart("ensembl", dataset="hsapiens_gene_ensembl")
mh_eg.ens <- getBM(attributes =
c("entrezgene","ensembl_gene_id","hgnc_symbol"), filters =
"entrezgene", values = mh.u.eg$Entrez_Gene_ID, mart = ensembl) # 3305
3
### I would like to merge mh.u.eg with mh_eg.ens
##sessionInfo
R version 2.13.0 (2011-04-13)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] scatterplot3d_0.3-33 WriteXLS_2.1.0 gdata_2.8.2
[4] DESeq_1.4.1 locfit_1.5-6 lattice_0.19-23
[7] akima_0.5-4 Biobase_2.12.1 biomaRt_2.8.0
loaded via a namespace (and not attached):
[1] annotate_1.30.0 AnnotationDbi_1.14.1 DBI_0.2-5
[4] genefilter_1.34.0 geneplotter_1.30.0 grid_2.13.0
[7] gtools_2.6.2 RColorBrewer_1.0-2 RCurl_1.6-4
[10] RSQLite_0.9-4 splines_2.13.0 survival_2.36-5
[13] tools_2.13.0 XML_3.4-0 xtable_1.5-6
Many Thanks,
Natasha
[[alternative HTML version deleted]]