Mismatched ArrayExpress microarray annotation package
1
0
Entering edit mode
sandmann.t ▴ 70
@sandmannt-11014
Last seen 15 months ago
United States

Dear Bioconductors,

I am contacting you because you are listed as the maintainer of the ArrayExpress Bioconductor ArrayExpress package. I tried to use the ArrayExpress function to access a large dataset stored in ArrayExpress: "E-GEOD-5258"

library("ArrayExpress")
GEOD5258.batch <- ArrayExpress( "E-GEOD-5258" )

The function downloads all of the necessary files, but then exits with the following message:

ArrayExpress: Reading data files
Loading required package: pd.u133aaofav2
Attempting to obtain 'pd.u133aaofav2' from BioConductor website.
Checking to see if your internet connection works...
Package 'pd.u133aaofav2' was not found in the BioConductor repository.
The 'pdInfoBuilder' package can often be used in situations like this.
Error in oligo::read.celfiles(filenames = file.path(path, unique(files))) : 
  The annotation package, pd.u133aaofav2, could not be loaded.
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘pd.u133aaofav2’
Error in readAEdata(path = path, files = dataFiles, dataCols = dataCols,  : 
  Unable to read cel files in/tmp/RtmpdN8fuF

The "E-GEOD-5258" dataset contains Affymetrix microarray data from two different array types: A-Affy-113 and A-Affy-33. The former is causing the problem, because its annotation cannot be found in Bioconductor under its original name. Instead, annotations for this microarray are available under the name "hthgu133a" .

I am not the only user who has run into this problem, see e.g.Help with error: 'pd.u133aaofav2 was not found in the BioConductor repository'.

Unfortunately the ArrayExpress function does not have any arguments that would allow me to manually set the array annotation package. Perhaps that feature would be worth adding?

Many thanks in advance,

Thomas

arrayexpress ArrayExpress hthgu133a.db • 1.8k views
ADD COMMENT
4
Entering edit mode
sandmann.t ▴ 70
@sandmannt-11014
Last seen 15 months ago
United States

For anybody else who get's stuck, here is a workaround for the E-GEOD-5258 Connectivity Map dataset, which could also be applied to other datasets as well:

library(ArrayExpress)
# global variables
kAccession <- "E-GEOD-5258"
kDataDir <- "~/data_dir"

# retrieve the raw data from ArrayExpress and place them into kDataDir
# (This will download several GB of data.)
dir.create(kDataDir)
mex = ArrayExpress::getAE(kAccession, type = "full", path = kDataDir)

# The following 'ae2bioc' command fails
# mex_raw = ArrayExpress::ae2bioc(mageFiles = mex, )  # ERROR
# Instead, we need the sample annotation table from ArrayExpress, which
# lists the array type for each CEL file as well.
phenoData <- read.delim(
  "https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-5258/E-GEOD-5258.sdrf.txt",
  stringsAsFactors = FALSE)

# As expected, there are results from two different array types
table(phenoData$Array.Design.REF)  # A-AFFY-113: 218 arrays, A-AFFY-33> 346 arrays

# We read the  raw data fromCEL files into AffyBatch objects, separately for
# each array type.
library(affy)
array_designs <- unique(phenoData$Array.Design.REF)
GEOD5258.batch <- lapply(
  X = setNames(array_designs, array_designs),
  FUN = function(design) {
    cel_files <- subset(phenoData, Array.Design.REF == design)$Array.Data.File
    pdata <- as(phenoData[match(cel_files, phenoData$Array.Data.File), ],
                "AnnotatedDataFrame")
    row.names(pdata) <- cel_files
    read.affybatch(filenames = file.path(kDataDir, cel_files),
                            phenoData = pdata)
  })

# The hthgu133a.db Bioconductor package contains the 
# current annotations fothe A-AFFY-113 array design.
annotation(GEOD5258.batch[["A-AFFY-113"]]) <- "hthgu133a"

# Now, we can generate RMA summaries & quantile normalized data for
# each array type.
library("hthgu133a.db")  # annotations for A-AFFY-133
library("hgu133a.db")  # annotation for A-AFFY-33
GEOD5258.rma <- lapply(GEOD5258.batch, rma)
GEOD5258.rma  # list of two ExpressionSets
ADD COMMENT

Login before adding your answer.

Traffic: 526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6