Entering edit mode
Asta Laiho
▴
70
@asta-laiho-2025
Last seen 10.3 years ago
I have been using AnnBuilder function ABPkgBuilder to create an
annotation package for Affymetrix array rat2302.
I compared the package that I created (23.3) to the rat2302
annotation package released on march 15th (Bioc 2.0) and I detected
some differences between the packages. I was wondering what could
cause these differences.
Here are the package information for the rat2302_1.15.13 package and
my own package.
BIOC:
Quality control information for rat2302
Date built: Created: Thu Mar 15 18:25:07 2007
Number of probes: 31099
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
rat2302ACCNUM found 31099 of 31099
rat2302CHRLOC found 12177 of 31099
rat2302CHR found 23212 of 31099
rat2302ENTREZID found 23250 of 31099
rat2302ENZYME found 1916 of 31099
rat2302GENENAME found 23229 of 31099
rat2302GO found 14228 of 31099
rat2302MAP found 22526 of 31099
rat2302PATH found 4535 of 31099
rat2302PMID found 14511 of 31099
rat2302REFSEQ found 23157 of 31099
rat2302SUMFUNC found 0 of 31099
rat2302SYMBOL found 23249 of 31099
rat2302UNIGENE found 22825 of 31099
Mappings found for non-probe based rda files:
rat2302CHRLENGTHS found 21
rat2302ENZYME2PROBE found 586
rat2302GO2ALLPROBES found 7649
rat2302GO2PROBE found 5695
rat2302ORGANISM found 1
rat2302PATH2PROBE found 177
rat2302PFAM found 18634
rat2302PMID2PROBE found 24911
rat2302PROSITE found 13246
My own package:
AnnBuilder_1.13.21
Affy: rat230_2.na22.annot.csv.zip (3/9/07)
GO: Built: 08-Feb-2007
Quality control information for rat2302Geno
Date built: Created: Fri Mar 23 14:18:24 2007
Number of probes: 31099
Probe number missmatch: None
Probe missmatch: None
Mappings found for probe based rda files:
rat2302GenoACCNUM found 31099 of 31099
rat2302GenoCHR found 23101 of 31099
rat2302GenoENTREZID found 23140 of 31099
rat2302GenoENZYME found 1913 of 31099
rat2302GenoGENENAME found 23119 of 31099
rat2302GenoGO found 18000 of 31099
rat2302GenoMAP found 22424 of 31099
rat2302GenoPATH found 4539 of 31099
rat2302GenoPMID found 14539 of 31099
rat2302GenoREFSEQ found 23045 of 31099
rat2302GenoSUMFUNC found 0 of 31099
rat2302GenoSYMBOL found 23140 of 31099
rat2302GenoUNIGENE found 22755 of 31099
Mappings found for non-probe based rda files:
rat2302GenoCHRLENGTHS found 21
rat2302GenoENZYME2PROBE found 590
rat2302GenoGO2ALLPROBES found 8313
rat2302GenoGO2PROBE found 6348
rat2302GenoORGANISM found 1
rat2302GenoPATH2PROBE found 177
rat2302GenoPFAM found 18575
rat2302GenoPMID2PROBE found 25257
rat2302GenoPROSITE found 13194
So my package is built 7 days after the official packet. Yet there can
be noticed some differences. Bioc package has 110 entrez ids more than
my package. This is surprising since the number of found entrez ids
should increase, not decrease by time to my experience. In the Bioc
package there are 55 unique entrez ids more than in my package.
I use the public representative id from rat2302 Affymetrix annotation
file as a primary source for the mappings and Unigene and Entrez id
columns as secondary sources for the mappings, like I have been told
is also done when creating the Bioc annotation package.
The most striking difference is in the GO information. Even that it is
decleared in the Bioc package html info page that the same release of
the GO information has been used in building it (08-Feb-2007) it seems
that older version has actually been used. What else could be the
explanation for that my package contains GO information for almost
4000 probesets more? When I last updated my own package in December, I
also had information for 4000 probesets less.
My package is also totally missing the CHRLOC information. This, I
assume, is because I get the following error message when building the
annotation package:
"Error in loadFromUrl(srcUrl) : URL
ftp://hgdownload.cse.ucsc.edu/goldenPath/currentGenomesrefLink.txt.gz
is incorrect or the target site is not responding!"
This file does not exist on the server anymore (it was removed already
last year) and I hope that AnnBuilder could soon be updated
accordingly.
Another thing that I faced was a problem in the Unigene data file
format. I had to remove the "//" on the last row of the file before
AnnBuilder was able to process the file.
Two weeks ago (when I built the package) the url for the KEGG data was
still working fine, but this week I noticed the url:
"ftp://ftp.genome.ad.jp/pub/kegg/pathways" had changed to
"ftp://ftp.genome.ad.jp/pub/kegg/pathway". Something else in the file
structure has changed as well, since fixing just the url did not help.
I hope that also this can soon be updated for the package.
I have attached below also the sessioninfo. I tested creating the same
annotation package also with R 2.4.0 and previous version of
AnnBuilder but the created package was identical to the one I managed
to create now.
Regards,
Asta Laiho
#---------------------------------------------------------------------
------------------------
> sessionInfo()
R version 2.5.0 Under development (unstable) (2007-02-11 r40701)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=
en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=
en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=
en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] "tools" "stats" "graphics" "grDevices" "utils"
"datasets"
[7] "methods" "base"
other attached packages:
AnnBuilder annotate XML Biobase rat2302
rat2302Geno
"1.13.21" "1.13.6" "1.6-0" "1.13.38" "1.15.13"
"1.0.0"