Entering edit mode
Nianhua Li
▴
870
@nianhua-li-1606
Last seen 10.3 years ago
Hi,
I received the following email from Lynn Amon and would like to answer
it through the mailing list.
mouse4302 was generated by using function ABPkgBuilder in package
AnnBuilder. The strategy is to first map probeset ids to Entrez Gene
IDs
and then use Entrez Gene IDs to retrieve other annotations (e.g.
symbol,
refseq, pathway, go). Because 1415822_at, 1415823_at and 1415824_at
were
all mapped to Entrez Gene ID 20249 which corresponds to Scd1, so all
of
their annotations (e.g. symbol, refseq) corresponds to Scd1.
So, the question goes to the mapping from probeset id to Entrez Gene
ID.
For mouse4302, we obtained the mapping in four ways:
(1) get probeset to GenBank accession mapping from Affymetrix
annotation, and then use
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz to map GenBank
accession to EntrezGene ID
(2) get probeset to GenBank accession mapping from Affymetrix
annotation, and then use
ftp://ftp.ncbi.nih.gov/repository/UniGene/Mus_musculus/Mm.data.gz to
map
GenBank accession to EntrezGene ID
(3) get probeset to EntrezGene mapping directly from Affymetrix
(4) get probeset to UniGene mapping from Affymetrix and then use
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz to map UniGene
cluster to EntrezGene ID
* note: Affymetrix annotation is dated on Dec 18, 2005, and the rest
is
on March 18, 2006.
We treat the first two as "trust" sources, and the last two as
supplimentary sources. So, the supplimentary sources won't be used
unless all the "trust" sources have missing values for a probeset. No
matter whether we use "trust" or "supplimentary" sources, if there is
disagreement on the mapping of a probeset, we pick the value that is
agreeed by most sources. If there is a tie, we will pick the first one
on the list (i.e. arbitrarily). In the case of 1415822_at, we got
20249,
20250, 20250, 20250 from the above four methods respectively. (BTW,
1415822_at was mapped to GenBank acc BG060909 in Affymetrix's
annotation). 20250 is the Entrez Gene record for Scd2, and 20249 is
for
Scd1. The value from "trusted" sources are 20249 and 20250. Because
20249 happens to be the frist one on the list, we picked it up.
It seems the software picked the wrong value in this paticular
example.
But it might be a reasonal approach in general. I am not the expert.
It
will be appreciated if someone could comment on this.
many thanks
Nianhua Li
computational biology, public health, FHCRC
>
>
> ---------- Forwarded message ----------
> Date: Thu, 08 Jun 2006 07:13:33 -0700
> From: Lynn Amon <lynnamon at="" u.washington.edu="">
> To: Ting-Yuan Liu <tliu at="" fhcrc.org="">
> Subject: Re: annotation services
>
> Hello Ting,
> I just loaded the newest version of mouse4302 from the Bioconductor
1.8 and it
> is different than the previous version. By chance, I looked at the
gene Scd1.
> Previously, 1415965_at and 1415964_at were the only probe ids given
for the gene
> Scd1 which agrees with annotation given on the affy website and the
chromosome
> view on Ensembl. Now, in addition to those probes, 1415822_at,
1415823_at and
> 1415824_at which were formerly annotated as Scd2 are given the
symbol
and refseq
> ID for Scd1 which does not agree with affy or Ensembl. Is there a
reason for
> these changes? Should I expect to see many changes in this new
annotation file?
> Shouldn't this annotation file agree with the annotations given by
affy?
> Thanks for you help,
> Lynn Amon
>
>