Entering edit mode
Perry Moerland
▴
130
@perry-moerland-1109
Last seen 2.7 years ago
Bioinformatics Laboratory, Academic Med…
Dear all, dear Mark,
I'm a grateful user of the illuminaHumanv4.db annotation package. One
of my collaborators is interested in probes mapping to C1orf151
according to the reannotation provided by the package. However, the
re-annotation for these probes seems inconsistent:
> Illids = get("C1orf151",revmap(illuminaHumanv4SYMBOLREANNOTATED))
> Illids
[1] "ILMN_2064311" "ILMN_1657860" "ILMN_1789599" "ILMN_2405009"
> indx = match(Illids,illuminaHumanv4fullReannotation()[,1])
> tab = illuminaHumanv4fullReannotation()[indx,]
> tab[,c(1,4,11:13,16)]
IlluminaID ProbeQuality EntrezReannotated
GenomicLocation SymbolReannotated EnsemblReannotated
4615 ILMN_2064311 Bad 440574
chr1:19954844:19954893:+ C1orf151 ENSG00000173436
24195 ILMN_1657860 Perfect 440574
chr1:19954399:19954448:+ C1orf151 ENSG00000173436
39363 ILMN_1789599 Perfect 440574
chr1:19984747:19984796:+ C1orf151 ENSG00000158747
46631 ILMN_2405009 Perfect 440574
chr1:19984595:19984644:+ C1orf151 ENSG00000158747
As you can see two probes map to ENSG00000173436 and the other two
probes to ENSG00000158747. This is in agreement with their annotation
on the Ensembl website. The reannotated Entrez Gene ID and the
reannotated symbol, however, seem inconsistent with this. According to
the Ensembl website and according to org.Hs.eg.db the annotation of
the two ENSG IDs is:
> IDs = unlist(mget(tab$EnsemblReannotated,org.Hs.egENSEMBL2EG))
> IDs
ENSG00000173436 ENSG00000173436 ENSG000001587471 ENSG000001587472
ENSG000001587471 ENSG000001587472
"440574" "440574" "4681"
"100532736" "4681" "100532736"
unlist(mget(IDs,org.Hs.egSYMBOL))
440574 440574 4681 100532736
4681 100532736
"MINOS1" "MINOS1" "NBL1" "MINOS1-NBL1" "NBL1"
"MINOS1-NBL1"
Note that C1orf151 is an alias for MINOS1 and that MINOS1 and NBL1 are
neighboring genes on chromosome 1, MINOS-NBL1 is the readthrough
transcript.
How come that illuminaHumanv4.db links all 4 probes to a single Entrez
Gene ID (440574) and a single symbol (C1orf151)? The more general
question is probably, how identifier conversion is performed for the
re-annotation. I tried to find a description in the package
documentation and in Barbosa-Morais et al. (2010) but without success.
best wishes,
Perry
---
Perry Moerland, PhD
Room J1B-215
Bioinformatics Laboratory, Department of Clinical Epidemiology,
Biostatistics and Bioinformatics
Academic Medical Center, University of Amsterdam
Postbus 22660, 1100 DD Amsterdam, The Netherlands
tel: +31 20 5666945
p.d.moerland@amc.uva.nl<mailto:p.d.moerland@amc.uva.nl>,
http://www.bioinformaticslaboratory.nl/
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
Kingdom.1252 LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C LC_TIME=English_United
Kingdom.1252
attached base packages:
[1] parallel stats graphics grDevices utils datasets
methods base
other attached packages:
[1] illuminaHumanv4.db_1.20.0 org.Hs.eg.db_2.10.1 RSQLite_0.11.4
DBI_0.2-7
[5] AnnotationDbi_1.24.0 Biobase_2.22.0
BiocGenerics_0.8.0
loaded via a namespace (and not attached):
[1] AnnotationForge_1.4.0 IRanges_1.20.4 stats4_3.0.2
________________________________
AMC Disclaimer : http://www.amc.nl/disclaimer
________________________________
[[alternative HTML version deleted]]