Question

Question about read.maimages

0

Entering edit mode

SSK ▴ 10

@ssk-7679

Last seen 9.7 years ago

United Kingdom

Hi,

I am interested in analysis of E-GEOD-37442. I have tried reading the data using:

#load package
library(limma)

#Change directory
setwd('/Users/E-GEOD-37442/')

#Read the sample and data relationship format (SDRF) file
SDRF <- read.delim("E-GEOD-37442.sdrf.txt",check.names=FALSE,stringsAsFactors=FALSE)

#Read data
x <- read.maimages(SDRF[,"Array Data File"],source="agilent")

However, I get the following error:

Error in data.frame(FileName = files, row.names = names, stringsAsFactors = FALSE) :
duplicate row.names: GSM919399_0361_ULS_252483510001_S01_GE2_105_Jan09_2_2, GSM919398_0361_ULS_252483510001_S01_GE2_105_Jan09_2_1, GSM919397_0361_ULS_252483510001_S01_GE2_105_Jan09_1_4, GSM919396_0361_ULS_252483510001_S01_GE2_105_Jan09_1_3, GSM919395_0361_ULS_252483510001_S01_GE2_105_Jan09_1_2, GSM919394_0361_ULS_252483510001_S01_GE2_105_Jan09_1_1

I had a few questions:

1) how can I tell if E-GEOD-37442 is single-channel or two-color channel

2) How to resolve the issue of duplicate row.names. They appear to have different descriptions in the sdrf.txt file.

I would really really appreciate any help.

Thank you!

differential gene expression limma read.maimages • 1.7k views

ADD COMMENT • link updated 9.8 years ago by Gordon Smyth 52k • written 9.8 years ago by SSK ▴ 10

score 1 · Answer 1 · 2015-05-11

You seem to be using the 'spaghetti' method of analyzing data - throw it against the wall and see if it sticks. I would instead recommend using the RTFM method of analyzing data. If you are planning to take on the responsibility of analyzing data, then you have to take on the responsibility. In the end you have to explain the data, what you did, why you did it, and what the results mean. If you simply get nursed through the analysis by people on this list, how do you think you will be able to do any of that?

In the interest of teaching somebody to fish:

You got these data from ArrayExpress. Do you think perhaps that the ArrayExpress people require those who upload data to their site to say what kind of array it was, and how it was processed? Have you looked there? I got the answer to your question in like 3 minutes.
I'll give you this on, as it's a bit obscure. Note that the srdf file appears to have everything in there twice:

> zz <- read.delim("E-GEOD-37442.sdrf.txt")

> zz$Array.Data.File
 [1] GSM919399_0361_ULS_252483510001_S01_GE2_105_Jan09_2_2.txt
 [2] GSM919399_0361_ULS_252483510001_S01_GE2_105_Jan09_2_2.txt
 [3] GSM919398_0361_ULS_252483510001_S01_GE2_105_Jan09_2_1.txt
 [4] GSM919398_0361_ULS_252483510001_S01_GE2_105_Jan09_2_1.txt
 [5] GSM919397_0361_ULS_252483510001_S01_GE2_105_Jan09_1_4.txt
 [6] GSM919397_0361_ULS_252483510001_S01_GE2_105_Jan09_1_4.txt
 [7] GSM919396_0361_ULS_252483510001_S01_GE2_105_Jan09_1_3.txt
 [8] GSM919396_0361_ULS_252483510001_S01_GE2_105_Jan09_1_3.txt
 [9] GSM919395_0361_ULS_252483510001_S01_GE2_105_Jan09_1_2.txt
[10] GSM919395_0361_ULS_252483510001_S01_GE2_105_Jan09_1_2.txt
[11] GSM919394_0361_ULS_252483510001_S01_GE2_105_Jan09_1_1.txt
[12] GSM919394_0361_ULS_252483510001_S01_GE2_105_Jan09_1_1.txt
6 Levels: GSM919394_0361_ULS_252483510001_S01_GE2_105_Jan09_1_1.txt ...

Seems to me a better course of action is

> z <- getAE("E-GEOD-37442")
> dat <- read.maimages(z$rawFiles, "agilent")

And then process using limma, which you can learn about by perusing the limma User's Guide.

score 0 · Answer 2 · 2015-05-12

I can see that you have tried to follow the code example given in the Corn Oil case study (Section 17.4) of the limma User's Guide. The difference is that the limma case study was single channel whereas this data is two color, and the SDRF file from ArrayExpress has a row for each channel rather than a row for each array.

There are many ways to read the data successfully. James has shown you a neat way using the getAE() function from the ArrayExpress package. That function will however re-download all the data afresh.

Alternatively, you could have simply used:

x <- read.maimages(SDRF[c(1,3,5,7,9,11),"Array Data File"],source="agilent")

In the end though, two color microarray data requires a lot of special attention to analyse well. You just have to understand the design of the experiment fully, and the only way to do that is to read the SDRF file carefully. When you do that for this data you will see that the first three arrays are simply dye flips (technical replicates) of the second three. I don't see any alternative to constructing an informative Targets file by hand. For this data, I made the following:

> Targets
                                                   FileName    Cy3    Cy5 BiologicalRep
1 GSM919394_0361_ULS_252483510001_S01_GE2_105_Jan09_1_1.txt 41DegC 37DegC             1
2 GSM919395_0361_ULS_252483510001_S01_GE2_105_Jan09_1_2.txt 41DegC 37DegC             2
3 GSM919396_0361_ULS_252483510001_S01_GE2_105_Jan09_1_3.txt 41DegC 37DegC             3
4 GSM919397_0361_ULS_252483510001_S01_GE2_105_Jan09_1_4.txt 37DegC 41DegC             1
5 GSM919398_0361_ULS_252483510001_S01_GE2_105_Jan09_2_1.txt 37DegC 41DegC             2
6 GSM919399_0361_ULS_252483510001_S01_GE2_105_Jan09_2_2.txt 37DegC 41DegC             3

Then you can read in the data by:

RG <- read.maimages(Targets,source="agilent")

You will create a design matrix like this:

design <- modelMatrix(Targets, ref="37DegC")
design <- cbind(DyeEffect=1, design)

Note that you need to estimate a probe-specific dye-effect, otherwise the dye flips will be wasted.

Later on during the analysis, you will need to use duplicateCorrelation to estimate the correlation between the technical replicates (which will be negative).