Entering edit mode
Katrina bell
▴
30
@katrina-bell-3021
Last seen 10.2 years ago
Dear All,
I have limited experience with analysis of two colour arrays and would
appreciate your thoughts on the follow design matrix I have
constructed and ways to deal with batch effects. It is an unconnected
design, of 5 conditions each with their own reference. I should say,
these are agilent 44k mouse arrays.
There are 39 arrays in total. There are technical dye swaps (which I
know aren't the best- but its what I have got) and these arrays have
been performed in 3 lots (so 3 batches). I have attempted to take care
of the technical replicates (same mouse/RNA, just labelled in reverse
for a dye swap) using the block function in lmFit
Targets
SlideNumber Cy3 Cy5 Batch Bioreps
1 WTCL WTCR 1 1
2 CoffeeAL CoffeeAR 1 2
3 CoffeeCL CoffeeCR 1 3
4 WTBL WTBR 1 4
5 WTBR WTBL 1 4
6 CoffeeAR CoffeeAL 1 2
7 WTCR WTCL 1 1
8 CoffeeCL CoffeeCR 1 5
9 WTBL WTBR 1 6
10 WTAL WTAR 1 7
11 CoffeeAL CoffeeAR 1 8
12 CoffeeAR CoffeeAL 1 8
13 WTAR WTAL 1 7
14 WTBR WTBL 1 6
15 CoffeeCR CoffeeCL 1 5
16 WTAL WTAR 1 9
17 WTCR WTCL 1 10
18 CoffeeCR CoffeeCL 1 3
19 WTAR WTAL 1 9
20 WTBR WTBL 2 11
21 WTBL WTBR 2 11
22 WTCR WTCL 2 12
23 WTCL WTCR 2 12
24 WTAR WTAL 2 13
25 WTAL WTAR 2 13
26 CoffeeCR CoffeeCL 2 14
27 CoffeeCL CoffeeCR 2 14
28 CoffeeAR CoffeeAL 2 15
29 CoffeeAL CoffeeAR 2 15
30 WTBR WTBL 3 16
31 WTBL WTBR 3 16
32 WTCR WTCL 3 17
33 WTCL WTCR 3 17
34 WTAR WTAL 3 18
35 WTAL WTAR 3 18
36 CoffeeCR CoffeeCL 3 19
37 CoffeeCL CoffeeCR 3 19
38 CoffeeAR CoffeeAL 3 20
39 CoffeeAL CoffeeAR 3 20
RG <- read.maimages(target, source= "agilent", path="ArrayFiles")
RG <- backgroundCorrect(RG, method="subtract")
I also tried using normexp, offset 50, but a couple of my arrays M
values really constricted after this...
RG$genes$Status <-controlStatus(spottypes, RG)
Matching patterns for: ControlType GeneName
Found 43379 probe
Found 604 DarkCorner
Found 14 GE_BrightCorner
Found 1486 controls
Setting attributes: values Color
> w <-modifyWeights(array(1,dim(RG)), RG$genes$Status,
c("BrightCorner", "DarkCorner"), c(0,0))
bioreps<-c(1,2,3,4,4,2,1,5,6,7,8,8,7,6,5,9,10,3,9,11,11,12,12,13,13,14
,14,15,15,16,16,17,17,18,18,19,19,20,20 )
MA <-normalizeWithinArrays(RG, weights=w, method='loess')
MA<-normalizeBetweenArrays(MA, method="Aquantile")
MA.avg <-avereps(MA, ID=MA$genes$ProbeName)
corfit<-duplicateCorrelation(MA.avg, block=biorep)
> corfit$consensus
[1] -0.812968
As this is an unconnected design, I followed Gordon's advice in
another posting and made my own design matrix.
> design
Dye WTAR WTBR WTCR CoffeeAR CoffeeCr
[1,] 1 0 0 -1 0 0
[2,] 1 0 0 0 -1 0
[3,] 1 0 0 0 0 -1
[4,] 1 0 -1 0 0 0
[5,] 1 0 1 0 0 0
[6,] 1 0 0 0 1 0
[7,] 1 0 0 1 0 0
[8,] 1 0 0 0 0 -1
[9,] 1 0 -1 0 0 0
[10,] 1 -1 0 0 0 0
[11,] 1 0 0 0 -1 0
[12,] 1 0 0 0 1 0
[13,] 1 1 0 0 0 0
[14,] 1 0 1 0 0 0
[15,] 1 0 0 0 0 1
[16,] 1 -1 0 0 0 0
[17,] 1 0 0 1 0 0
[18,] 1 0 0 0 0 -1
[19,] 1 1 0 0 0 0
[20,] 1 0 1 0 0 0
[21,] 1 0 -1 0 0 0
[22,] 1 0 0 1 0 0
[23,] 1 0 0 -1 0 0
[24,] 1 1 0 0 0 0
[25,] 1 -1 0 0 0 0
[26,] 1 0 0 0 0 1
[27,] 1 0 0 0 0 -1
[28,] 1 0 0 0 1 0
[29,] 1 0 0 0 -1 0
[30,] 1 0 1 0 0 0
[31,] 1 0 -1 0 0 0
[32,] 1 0 0 1 0 0
[33,] 1 0 0 -1 0 0
[34,] 1 1 0 0 0 0
[35,] 1 -1 0 0 0 0
[36,] 1 0 0 0 0 1
[37,] 1 0 0 0 0 -1
[38,] 1 0 0 0 1 0
[39,] 1 0 0 0 -1 0
fit<- lmFit(MA.avg,design, block=bioreps, cor=corfit$consensus)
fit2 <-eBayes(fit)
WTAR<- topTable(fit2, coef=2, adjust="BH")
Is it sensible to make a coefficent for each of the batches in my
design with my set of arrays? So three extra columns? I am unsure if I
have enough information in my arrays for this, and I would appreciated
your advice/ suggestions. I am especially concerned about how to
treat the batch effect as the second batch has some background
hybridisation issues from looking at the FE array images. Although
they look OK on the QC in limma- just more constricted M values than
the other arrays, I am concerned about them. I did remove the whole
batch and ran the analysis with the remaining 29 arrays to gauge what
effect they were having on the analysis and found that I got even less
statistically significant genes.
So, my questions are;
1. is the design matrix I constructed OK ?
2. How can I deal with the batch effect in my set off arrays.
3. Any other comments welcome!
Thanks for any help you are able to give.
Cheers
Katrina
[[alternative HTML version deleted]]