Question

Creating design matrix in LIMMA

0

Entering edit mode

sgblackpearl • 0

@sgblackpearl-12124

Last seen 8.1 years ago

Hi everyone, I am stuck with creating design matrix in LIMMA for my microarray experimental design. The experiment consists of 40 Agilent arrays. The subjects were treated with two stress conditions (S1 and S2). In each stress condition, the subjects were sampled in 5 timepoints (0, 6, 12, 24 and 48). Microarray experiment was performed in such a way that every fourth array of a particular timepoint, irrespective of stress treatment, is dye swapped. Basically, it is a time course experiment on the same subject with two different stresses. I want to to know which are the genes important/responsible for stress 1 or 2. The target file looks like below:

SampleNumber	FileName	Cy3	Cy5	Stress
1	Array1	pool1	S1_Time0	S1
2	Array2	pool1	S1_Time0	S1
3	Array3	pool1	S1_Time0	S1
4	Array4	S1_Time0	pool1	S1
5	Array5	pool1	S1_Time6	S1
6	Array6	pool1	S1_Time6	S1
7	Array7	pool1	S1_Time6	S1
8	Array8	S1_Time6	pool1	S1
9	Array9	pool1	S1_Time12	S1
10	Array10	pool1	S1_Time12	S1
11	Array11	pool1	S1_Time12	S1
12	Array12	S1_Time12	pool1	S1
13	Array13	pool1	S1_Time24	S1
14	Array14	pool1	S1_Time24	S1
15	Array15	pool1	S1_Time24	S1
16	Array16	S1_Time24	pool1	S1
17	Array17	pool1	S1_Time48	S1
18	Array18	pool1	S1_Time48	S1
19	Array19	pool1	S1_Time48	S1
20	Array20	S1_Time48	pool1	S1
21	Array21	pool2	S2_Time0	S2
22	Array22	pool2	S2_Time0	S2
23	Array23	pool2	S2_Time0	S2
24	Array24	S2_Time0	pool2	S2
25	Array25	pool2	S2_Time6	S2
26	Array26	pool2	S2_Time6	S2
27	Array27	pool2	S2_Time6	S2
28	Array28	S2_Time6	pool2	S2
29	Array29	pool2	S2_Time12	S2
30	Array30	pool2	S2_Time12	S2
31	Array31	pool2	S2_Time12	S2
32	Array32	S2_Time12	pool2	S2
33	Array33	pool2	S2_Time24	S2
34	Array34	pool2	S2_Time24	S2
35	Array35	pool2	S2_Time24	S2
36	Array36	S2_Time24	pool2	S2
37	Array37	pool2	S2_Time48	S2
38	Array38	pool2	S2_Time48	S2
39	Array39	pool2	S2_Time48	S2
40	Array40	S2_Time48	pool2	S2

limma microarray R • 2.4k views

ADD COMMENT • link 8.1 years ago sgblackpearl • 0

1

Entering edit mode

You haven't explained what pool1 and pool2 are. I'm guessing they are all the samples for a given stressor pooled together? Also, you say that your experiment involved 4 subjects per stressor, but you've left out important details. Were the same subjects used for both stressors? Were the same subjects used for all time points? Your wording does not make this clear. You need to fully document all the known variables in your experimental design before you can construct your design matrix.

ADD REPLY • link 8.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Yes, pool1 & 2 are all the samples pooled from all the time points for the respective stressors. The subjects used for all the time points both stressors are same.

ADD REPLY • link 8.1 years ago sgblackpearl • 0

1

Entering edit mode

Ok, then that is an important feature of the experimental design, and you to add a column with that information to your sample table.

ADD REPLY • link 8.1 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

Thanks Gordon and Ryan for your valuable inputs.

I have another set of experiments which is pretty much similar to that of the above one. There also I have two different stressors, and similar dye swap but the reference RNA is same for all the arrays (essentially 'pool1' and 'pool2' from the above experiment is a common 'pool' here). I want to know genes common and unique to different stressors.

ADD REPLY • link 8.1 years ago sgblackpearl • 0

1

Entering edit mode

With a single common reference, you just need one call to modelMatrix. The multiple calls and concatenation with blockDiag were all to handle the multiple reference pools. As for the subject effects, I'm not sure how to incorporate those into a 2-color design.

ADD REPLY • link 8.1 years ago Ryan C. Thompson ★ 7.9k

score 2 · Answer 1 · 2017-01-08

The fact that two different pools were used means that you almost have two independent experiments. For a study like this, the standard limma method is to make two design matrices and combine them together:

design.S1 <- modelMatrix(targets[ 1:20,], ref="pool1")
design.S2 <- modelMatrix(targets[21:40,], ref="pool2")
design <- blockDiag(design.S1, design.S2)

I would also add an intercept term to allow for probe-specific dye effects, which tend to be important for Agilent arrays:

design <- cbind(Dye=1, design)

This is equivalent to Ryan's code, but a bit simpler.

score 1 · Answer 2 · 2017-01-08

Since your two stressors apparently have different reference samples (pool1 and pool2), I think the most straightforward way to implement this is by making a separate design matrix for each stressor and then combining them into a single block-diagonal matrix using Matrix::bdiag (correction: limma already provides a blockDiag function for exactly this purpose):

library(limma)
library(stringr)

# Indicate the reference sample for each row
targets$Ref <- str_replace(targets$Stress, "S", "pool")

# Split by reference and make the design matrix for each split
targets.split <- split(targets, targets$Ref)
designs <- lapply(targets.split, function(tgt) {
    modelMatrix(targets=droplevels(tgt), ref=tgt$Ref[1])
})

# Combine the separate design matrices into one
design <- as.matrix(do.call(blockDiag, designs))
## Ensure that the design has the rows in the same order as targets
design <- design[rownames(targets),]

## Define contrasts for an ANOVA test of all time points for each
## stressor
stressor1_contrasts <- sprintf("S1_Time%i - S1_Time0", c(6, 12, 24, 48))
stressor2_contrasts <- sprintf("S2_Time%i - S2_Time0", c(6, 12, 24, 48))
## Build the contrast matrix for each ANOVA test
stressor1_cmat <- makeContrasts(contrasts=stressor1_contrasts, levels=design)
stressor2_cmat <- makeContrasts(contrasts=stressor2_contrasts, levels=design)

From there, you can use the usual limma procedure of lmFit, contrasts.fit, eBayes, and topTable to get a table of genes for each ANOVA test.

If there are any additional variables to be taken into account, such as a subject effect, you'll need to reply with more information.

score 0 · Answer 3 · 2017-01-08

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 23 months ago

United States

Woops -- provided a boneheaded answer in haste. Had a momentary lapse of reason regarding the nuts and bolts of two-color arrays.

The details of my original answer have been redacted to mitigate any damage this might cause a wary future traveler ...

Thanks to Ryan for the correction

ADD COMMENT • link 8.1 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

I don't think this design is appropriate for a 2-color array with dye swaps. I don't have a lot of experience with dye swaps, but my understanding is that a 2-color array design typically doesn't have an intercept (since every array represents the fold change between 2 samples), and a dye-swap is implemented by flipping the signs on the corresponding row of the design matrix, not by adding an additional coefficient. This is how limma::modelMatrix does it. (Although you might also add an dye swap coefficient in addition to the sign flip, to control for the possibility that the dyes are not equivalent). This design also doesn't account for the use of two different reference samples.

I also think the design for this experiment needs to have an interaction between time and stressor, since the experimenter is interested in the separate time effects for each stressor.

ADD REPLY • link 8.1 years ago Ryan C. Thompson ★ 7.9k