Question

Annotation package for Codelink 55K Human Array

3

Entering edit mode

Agaz Hussain Wani ▴ 260

@agaz-hussain-wani-7620

Last seen 7.0 years ago

India

I want to know if there is an annotation package for Codelink 55K Human Array like h10kcod.db,h20kcod.db, hwgcod.db. How to annotate a data of Codelink 55K Human Array.

r annotation codelink • 2.9k views

ADD COMMENT • link updated 9.7 years ago by Diego Diez ▴ 760 • written 9.7 years ago by Agaz Hussain Wani ▴ 260

score 1 · Answer 1 · 2015-07-18

1

Entering edit mode

Diego Diez ▴ 760

@diego-diez-4520

Last seen 4.5 years ago

Japan

Although in NCBI GEO they seem to be different (see here vs. here) I believe the so called Codelink 55K arrays refer to the (originally named) "whole genome" arrays. So you can try to use the Human Whole Genome annotation package: http://bioconductor.org/packages/hwgcod.db/ but let me know if you encounter any problems. Note that in the title of the package it is included the 55K (~55.000) tag as well suggesting they might be the same.

UPDATE

After further investigation, I can confirm that the so-called Human 55K array is indeed a Human Whole Genome array. All the probes listed in GPL15158 (which correspond to the 55K definition in GEO) are present in the Whole Genome array defined in GPL2895, or in the Bioconductor annotation package hwgcod.db.

The definition in GPL2895 (whole genome) contains more probes than the 55K array or the Bioconductor package. This is mainly because it contains probes labelled as "MASK" which were not included in the original chip file used to generate the annotation packages. However, the information in those probes is irrelevant in terms of annotation.

ADD COMMENT • link 9.7 years ago Diego Diez ▴ 760

0

Entering edit mode

Thanks for your comments. I think there is a mismatch of probes and gene names.

ADD REPLY • link 9.7 years ago Agaz Hussain Wani ▴ 260

0

Entering edit mode

I tried to annotate with hwgcod.db, but it gives miss match .For example in gpl15158 probe GE766244 refers to gene symbol LOC343566 but from hwgcod.db reflects NA. In the same GPL file probe GE521442 refers to symbol LOC130951 but from hwgcod.db i got M1AP and many other cases like this.

ADD REPLY • link 9.7 years ago Agaz Hussain Wani ▴ 260

0

Entering edit mode

See my updated response that confirms the arrays are identical. The difference in annotation is because the packages are effectively re-annotated using the latest gene information (due to some gene ids being updated, or eliminated). That is one of the points of having the Bioconductor annotations. Indeed, the packages annotation could be even improved further if the sequences of the probes were rematched to the latest genome (instead of using the original mapping). I may consider creating such packages in the future.

ADD REPLY • link 9.7 years ago Diego Diez ▴ 760