Question

Processing agilent data by limma

0

Entering edit mode

Agaz Hussain Wani ▴ 260

@agaz-hussain-wani-7620

Last seen 7.0 years ago

India

I am trying to process Agilent data by using limma R package. For GSE10469, I used the following code

raw_data <- read.maimages(pdata[,1], source = "agilent") # pdata file is having group information
I get the error:
Error in readGenericHeader(fullname, columns = columns, sep = sep) :
  Specified column headings not found in file

When I try

raw_data <- read.maimages(pdata[,1], source = "agilent", green.only = TRUE)
Read GSM264878.txt
Error in RG[[a]][, i] <- obj[, columns[[a]]] :
  number of items to replace is not a multiple of replacement length

And also

raw_data <- read.maimages(pdata[,1], source = "agilent", green.only = FALSE)
Error in readGenericHeader(fullname, columns = columns, sep = sep) :
  Specified column headings not found in file

For GSE32006

raw_data <- read.maimages(pdata[,1], source = "agilent")
Error in readGenericHeader(fullname, columns = columns, sep = sep) :
  Specified column headings not found in file

And

raw_data <- read.maimages(pdata[,1], source = "agilent", green.only = TRUE)
Read GSM792633.txt
Read GSM792634.txt
Read GSM792635.txt
Read GSM792636.txt
Read GSM792637.txt
Read GSM792638.txt
Error in readGenericHeader(fullname, columns = columns) :
  Specified column headings not found in file

The files which are read from GSE32006 are gene expression, where as other failed files are from exon array.

So how can I deal with all these issues.

limma agilent microarrays • 3.2k views

ADD COMMENT • link updated 7.0 years ago by Gordon Smyth 52k • written 7.0 years ago by Agaz Hussain Wani ▴ 260

0

Entering edit mode

Code snippets are not useful! Unless you show exactly what you did, you are expecting people to guess at what you might have done, and most people are too busy to bother with such things. You need to show a short, self-contained (e.g., anybody can run) bit of code to show exactly what you did and where the error is.

ADD REPLY • link 7.0 years ago James W. MacDonald 68k

score 2 · Answer 1 · 2018-04-08

The first problem occurs when you try to read in single-channel data as if it was two color. The read.maimages() function requires that you tell it explicitly to read in the green channel only by specifying green.only=TRUE.

The second problem occurs when people upload "raw" data files to GEO that have been edited or corrupted, and are therefore no longer in proper Agilent format.

In the case of GSE10469, it appears that someone (one of the authors presumably) has opened the first file GSM264878.txt in Excel, then written it out again but now with extra rows and an extra column. The other files are ok. You can fix the problem simply by changing the order of the files when you read them in, so that GSM264878 is not the first file:

> files
 [1] "GSM264878.txt.gz" "GSM264879.txt.gz" "GSM264880.txt.gz" "GSM264881.txt.gz"
 [5] "GSM264882.txt.gz" "GSM264883.txt.gz" "GSM264884.txt.gz" "GSM264885.txt.gz"
 [9] "GSM264886.txt.gz" "GSM264887.txt.gz" "GSM264888.txt.gz" "GSM264889.txt.gz"
> x <- read.maimages(files[c(2,1,3:12)], source="agilent", green.only=TRUE)
Read GSM264879.txt.gz 
Read GSM264878.txt.gz 
Read GSM264880.txt.gz 
Read GSM264881.txt.gz 
Read GSM264882.txt.gz 
Read GSM264883.txt.gz 
Read GSM264884.txt.gz 
Read GSM264885.txt.gz 
Read GSM264886.txt.gz 
Read GSM264887.txt.gz 
Read GSM264888.txt.gz 
Read GSM264889.txt.gz

In the case of GSE32006, you can't expect to read in gene expression and exon arrays with the same read command because they have different probe sets. You naturally have to read and analyse the gene arrays and the exon arrays separately.