I can reproduce your error:
> library(oligo)
>
> filenames = list.celfiles()
> affy.data <- read.celfiles( filenames = filenames)
Loading required package: pd.2.0
Attempting to obtain 'pd.2.0' from BioConductor website.
Checking to see if your internet connection works...
Package 'pd.2.0' was not found in the BioConductor repository.
The 'pdInfoBuilder' package can often be used in situations like this.
Error in read.celfiles(filenames = filenames) :
The annotation package, pd.2.0, could not be loaded.
>
The error occurs because the content of the CEL files are not according to specification. Because of that, oligo
deduces it are pd.2.0 arrays that it is trying to load, which is not the case (and these type of arrays even do also not exist)!
If you check the GEO submission (and publication), you will see these arrays have actually been run on a platform with ID GPL570, which correspond to Affymetrix Human Genome U133 Plus 2.0 Arrays, abbreviated with HG-U133_Plus_2. The corresponding PdInfo
package is pd.hg.u133.plus.2. Note that the oligo
uses so-called PdInfo
(probe design info) packages as annotation files.
Having a mote detailed look: if you check the help page of the oligo
function read.celfiles
(type ?read.celfiles
), you will notice under the hood oligo
uses functions from affyio
to load the CEL files, and based on the header of the CEL file (i.e. the cdfName
-slot) the corresponding annotation file will be automagically loaded.
In other words, the error points to something going wrong when the CEL file header is read. Let's inspect this manually:
> affyio::read.celfile.header("GSM714070_ADV1.CEL", info="full")
$cdfName
[1] "2.0"
$`CEL dimensions`
Cols Rows
1164 1164
$GridCornerUL
[1] 0 0
$GridCornerUR
[1] 0 0
$GridCornerLR
[1] 0 0
$GridCornerLL
[1] 0 0
$DatHeader
[1] " \024 \024 HG-U133 Plus 2.0.1sq \024 \024 \024 \024 \024 \024 \024 \024 \024 "
$Algorithm
[1] "Unknown"
$AlgorithmParameters
[1] "P1:"
$ScanDate
character(0)
>
Note that the cdfName
is (just) "2.0"
. This is weird, because this should be the full name of the type of chip used... (Also see that in $DatHeader
the term HG-U133 Plus 2.0.1sq
is present, pointing to Human Genome U133 Plus 2.0 Arrays).
To show this, the output when checking one of the (very old) files we generated in our lab:
> affyio::read.celfile.header("A125_01_CTR.CEL", info="full")
$cdfName
[1] "HG-U133_Plus_2"
$`CEL dimensions`
Cols Rows
1164 1164
$GridCornerUL
[1] 213 206
$GridCornerUR
[1] 8404 215
$GridCornerLR
[1] 8400 8395
$GridCornerLL
[1] 208 8386
$DatHeader
[1] "[12..47665] A125_01_ctr:CLS=8609 RWS=8609 XIN=1 YIN=1 VE=30 2.0 06/25/08 11:41:44 50209050 M10 \024 \024 HG-U133_Plus_2.1sq \024 \024 \024 \024 \024 570 \024 25347.941406 \024 3.500000 \024 1.5600 \024 6"
$Algorithm
[1] "Percentile"
$AlgorithmParameters
[1] "Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.00000"
$ScanDate
[1] "06/25/08 11:41:44"
>
Note that cdfName
is "HG-U133_Plus_2"
. That is a very well know chip type!
Do the same, but now using an affxparser
function:
> affxparser::readCelHeader("GSM714070_ADV1.CEL")
$filename
[1] "./GSM714070_ADV1.CEL"
$version
[1] 4
$cols
[1] 1164
$rows
[1] 1164
$total
[1] 1354896
$algorithm
[1] "Unknown"
$parameters
[1] "P1:1;CellMargin:0"
$chiptype
[1] "HG-U133 Plus 2"
$header
[1] "Cols=1164\nRows=1164\nTotalX=1164\nTotalY=1164\nOffsetX=0\nOffsetY=0\nGridCornerUL=0 0\nGridCornerUR=0 0\nGridCornerLR=0 0\nGridCornerLL=0 0\nAxis-invertX=0\nAxisInvertY=0\nswapXY=0\nDatHeader= \024 \024 HG-U133 Plus 2.1sq \024 \024 \024 \024 \024 \024 \024 \024 \024 \nAlgorithm=Unknown\nAlgorithmParameters=P1:1;CellMargin:0\n"
$datheader
[1] " \024 \024 HG-U133 Plus 2.1sq \024 \024 \024 \024 \024 \024 \024 \024 \024 "
$librarypackage
[1] ""
$cellmargin
[1] 0
$noutliers
[1] 146487
$nmasked
[1] 0
>
Note that chiptype
is "HG-U133 Plus 2"
.
Idem, using my old file:
> affxparser::readCelHeader("A125_01_CTR.CEL")
$filename
[1] "./A125_01_CTR.CEL"
$version
[1] 4
$cols
[1] 1164
$rows
[1] 1164
$total
[1] 1354896
$algorithm
[1] "Percentile"
$parameters
[1] "Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.000000"
$chiptype
[1] "HG-U133_Plus_2"
$header
[1] "Cols=1164\nRows=1164\nTotalX=1164\nTotalY=1164\nOffsetX=0\nOffsetY=0\nGridCornerUL=213 206\nGridCornerUR=8404 215\nGridCornerLR=8400 8395\nGridCornerLL=208 8386\nAxis-invertX=0\nAxisInvertY=0\nswapXY=0\nDatHeader=[12..47665] A125_01_ctr:CLS=8609 RWS=8609 XIN=1 YIN=1 VE=30 2.0 06/25/08 11:41:44 50209050 M10 \024 \024 HG-U133_Plus_2.1sq \024 \024 \024 \024 \024 570 \024 25347.941406 \024 3.500000 \024 1.5600 \024 6\nAlgorithm=Percentile\nAlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004;AlgVersion:6.0;FixedCellSize:TRUE;FullFeatureWidth:7;FullFeatureHeight:7;IgnoreOutliersInShiftRows:FALSE;FeatureExtraction:TRUE;PoolWidthExtenstion:2;PoolHeightExtension:2;UseSubgrids:FALSE;RandomizePixels:FALSE;ErrorBasis:StdvMean;StdMult:1.000000\n"
$datheader
[1] "[12..47665] A125_01_ctr:CLS=8609 RWS=8609 XIN=1 YIN=1 VE=30 2.0 06/25/08 11:41:44 50209050 M10 \024 \024 HG-U133_Plus_2.1sq \024 \024 \024 \024 \024 570 \024 25347.941406 \024 3.500000 \024 1.5600 \024 6"
$librarypackage
[1] ""
$cellmargin
[1] 2
$noutliers
[1] 67
$nmasked
[1] 0
>
... thus:
somehow the CEL files that are available at GEO miss some important information, and as a consequence of that oligo
attempts to download a wrong/non-existing file. This behavior can be overcome by manually specifying the chiptype. See post below.
Why this info is missing in the CEL files, and whether these CEL files may have additional issues, is another question...
... to complete my previous post:
Thus, manually setting the chiptype/
PdInfo
package will work: