Question

Microarray Data Analysis CDF Problem

0

Entering edit mode

sherajilir • 0

@sherajilir-18567

Last seen 5.0 years ago

Hello Everyone,

I am trying to normalize a dataset GSE45220 based on hugene.1.1.st.v1 platform. However, when i try to do rma or gcrma normalization, i get an error about the missing cdf-file which does not work for the package anyway.

#GSE45220
BiocManager::install("GEOquery")
library(GEOquery)
library(dplyr)
BiocManager::install("gcrma")
library(gcrma)
BiocManager::install("pd.hugene.1.1.st.v1")
library(pd.hugene.1.1.st.v1)
BiocManager::install("hugene10sttranscriptcluster.db")
library(hugene10sttranscriptcluster.db)

untar("GSE45220_RAW.tar", exdir="data1")
cels = list.files("data1/", pattern = "CEL")
sapply(paste("data1", cels, sep="/"), gunzip)
cels = list.files("data1/", pattern = "CEL")
raw.data=ReadAffy(filenames=cels)

Warning message:

The affy package can process data from the Gene ST 1.x series of arrays, but you should consider using either the oligo or xps packages, which are specifically designed for these arrays.

data.rma.norm=rma(raw.data)

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘rma’ for signature ‘"AffyBatch"’

I then tried oligo package but the problem persisted. SCAN also did not work, giving me this error Error in as.character.default(x) : no method for coercing this S4 class to a vector

Has anyone been through the same experience? I can use the processed data but normalizing myself would be much better i think.

I am using R version 3.6.0.

Thank you

microarray normalization • 912 views

ADD COMMENT • link updated 5.8 years ago by Guido Hooiveld ★ 4.1k • written 5.8 years ago by sherajilir • 0

score 1 · Answer 1 · 2019-07-15

Hi, First some remarks: Use indeed only the package oligo to read and process these files, and not affy. There is indeed no CDF available for the HuGene ST 1.1 arrays! That's why you need the oligo-based framework with the corresponding PlatformDesign (PdInfo) info package! gcRMA normalization cannot be applied to these arrays, because only PM probes are on the array (the required MM probes are missing). After normalization, I strongly recommend to add annotation info using the function annotateEset() from the library affycoretools.

Some code to get you started:

> library(oligo)
> library(hugene11sttranscriptcluster.db)
> library(affycoretools)
> 
> # read in CEL files
> path<- "./GSE45220_RAW" #dir with (compressed) CEL files
> raw.data <- read.celfiles(filenames = list.celfiles(path,  full.names=TRUE, listGzipped=TRUE) )
Loading required package: pd.hugene.1.1.st.v1
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : ./GSE45220_RAW/GSM1099310_PS01_uns_A05_2.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099311_PS02_NaB_A07.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099312_PS03_NaB_Cip125_A09.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099313_PS04_uns_B05.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099314_PS05_NaB_B07.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099315_PS06_NaB_Cip150_B09.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099316_PS07_uns_C05.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099317_PS08_NaB_C07.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099318_PS09_NaB_Cip150_C09.CEL.gz
> 
> # RMA normalization
> norm.data <- oligo::rma(raw.data, target = "core")
Background correcting
Normalizing
Calculating Expression
> 
> # add annotation info (using functionality affycoretools)
> norm.data <- annotateEset(norm.data,  hugene11sttranscriptcluster.db)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> norm.data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 33297 features, 9 samples 
  element names: exprs 
protocolData
  rowNames: GSM1099310_PS01_uns_A05_2.CEL.gz
    GSM1099311_PS02_NaB_A07.CEL.gz ...
    GSM1099318_PS09_NaB_Cip150_C09.CEL.gz (9 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM1099310_PS01_uns_A05_2.CEL.gz
    GSM1099311_PS02_NaB_A07.CEL.gz ...
    GSM1099318_PS09_NaB_Cip150_C09.CEL.gz (9 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData
  featureNames: 7892501 7892502 ... 8180418 (33297 total)
  fvarLabels: PROBEID ENTREZID SYMBOL GENENAME
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: pd.hugene.1.1.st.v1