Question

HGU133PLUS.db : NA's when using getSYMBOL while uptodate NetAffx database gives gene names

0

Entering edit mode

benoit.tessoulin • 0

@benoittessoulin-12350

Last seen 7.9 years ago

Hi,

I've been working around with Affy files for a while, I had been using direct datas from NetAffx to annotate my raw Affy files (merging expression data with annotation data by probe_id).

I recently shifted to a more straightforward method with Annotate and the HGU133PLUS2 package (which corresponds to my data). While some probesets are still associated with genes in NetAffx (online and when I download database) and in hgu133plus2.db, I can't see them associated with gene names.

For instance, I can use two methods to get gene names:

biocLite(hgu133plus2.db)
biocLite(annotate)

r=rownames(df_rma)
head(r)
[1] "1053_at"   "117_at"    "121_at"    "1255_g_at" "1316_at"   "1320_at" 

symb_ID=getSYMBOL(r,"hgu133plus2.db") 
head(symb_ID)
1053_at    117_at    121_at 1255_g_at   1316_at   1320_at
[1] "RFC2"   "HSPA6"    "PAX8"  "GUCA1A"    "THRA"  "PTPN21"

table(is.na(symb_ID))
FALSE
42358

eligibles=hgu133plus2SYMBOL[r]
> annots=toTable(eligibles)
> table(is.na(annots$symbol))

FALSE
42358

This is OK (we start from 54675 rownames, so 12317 genes aren't annotated), but when I look for a particular probeset of a gene of interest (for instance BBC3) for which a probeset is given by Affy:

grep("BBC",annots$symbol)
integer(0)

grep("211692_s_at",annots$probe_id)
integer(0)

This very gene isn't annotated. Still, it's correctly annotated into hgu133plus2.db:

 grep("211692_s_at",(keys(hgu133plus2.db)))

[1] 21014

grep("BBC3",(keys(hgu133plus2.db,keytype="SYMBOL")))
[1] 10487

s=select(hgu133plus2.db,keys="211692_s_at",columns="SYMBOL")
'select()' returned 1:many mapping between keys and columns
s
      PROBEID  SYMBOL
1 211692_s_at    BBC3
2 211692_s_at MIR3191
3 211692_s_at MIR3190

So, is it that probesets that matche several transcripts are "discarded"? In the very good documentation of Marc Carlson it's not straightforwardly inidcated.

Tahnk you!

Annotation hgu133plus2 • 1.4k views

ADD COMMENT • link updated 7.9 years ago by James W. MacDonald 67k • written 7.9 years ago by benoit.tessoulin • 0

score 0 · Answer 1 · 2017-02-13

0

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 2 hours ago

United States

You have essentially answered your own question. Using the old style functions like getSYMBOL should be avoided, and you should use the more modern functions select or mapIds, for obvious reasons.