Question

Two colour arrays - advice on unconnected design, batch effect

0

Entering edit mode

Katrina bell ▴ 30

@katrina-bell-3021

Last seen 10.2 years ago

Dear All, I have limited experience with analysis of two colour arrays and would appreciate your thoughts on the follow design matrix I have constructed and ways to deal with batch effects. It is an unconnected design, of 5 conditions each with their own reference. I should say, these are agilent 44k mouse arrays. There are 39 arrays in total. There are technical dye swaps (which I know aren't the best- but its what I have got) and these arrays have been performed in 3 lots (so 3 batches). I have attempted to take care of the technical replicates (same mouse/RNA, just labelled in reverse for a dye swap) using the block function in lmFit Targets SlideNumber Cy3 Cy5 Batch Bioreps 1 WTCL WTCR 1 1 2 CoffeeAL CoffeeAR 1 2 3 CoffeeCL CoffeeCR 1 3 4 WTBL WTBR 1 4 5 WTBR WTBL 1 4 6 CoffeeAR CoffeeAL 1 2 7 WTCR WTCL 1 1 8 CoffeeCL CoffeeCR 1 5 9 WTBL WTBR 1 6 10 WTAL WTAR 1 7 11 CoffeeAL CoffeeAR 1 8 12 CoffeeAR CoffeeAL 1 8 13 WTAR WTAL 1 7 14 WTBR WTBL 1 6 15 CoffeeCR CoffeeCL 1 5 16 WTAL WTAR 1 9 17 WTCR WTCL 1 10 18 CoffeeCR CoffeeCL 1 3 19 WTAR WTAL 1 9 20 WTBR WTBL 2 11 21 WTBL WTBR 2 11 22 WTCR WTCL 2 12 23 WTCL WTCR 2 12 24 WTAR WTAL 2 13 25 WTAL WTAR 2 13 26 CoffeeCR CoffeeCL 2 14 27 CoffeeCL CoffeeCR 2 14 28 CoffeeAR CoffeeAL 2 15 29 CoffeeAL CoffeeAR 2 15 30 WTBR WTBL 3 16 31 WTBL WTBR 3 16 32 WTCR WTCL 3 17 33 WTCL WTCR 3 17 34 WTAR WTAL 3 18 35 WTAL WTAR 3 18 36 CoffeeCR CoffeeCL 3 19 37 CoffeeCL CoffeeCR 3 19 38 CoffeeAR CoffeeAL 3 20 39 CoffeeAL CoffeeAR 3 20 RG <- read.maimages(target, source= "agilent", path="ArrayFiles") RG <- backgroundCorrect(RG, method="subtract") I also tried using normexp, offset 50, but a couple of my arrays M values really constricted after this... RG$genes$Status <-controlStatus(spottypes, RG) Matching patterns for: ControlType GeneName Found 43379 probe Found 604 DarkCorner Found 14 GE_BrightCorner Found 1486 controls Setting attributes: values Color > w <-modifyWeights(array(1,dim(RG)), RG$genes$Status, c("BrightCorner", "DarkCorner"), c(0,0)) bioreps<-c(1,2,3,4,4,2,1,5,6,7,8,8,7,6,5,9,10,3,9,11,11,12,12,13,13,14 ,14,15,15,16,16,17,17,18,18,19,19,20,20 ) MA <-normalizeWithinArrays(RG, weights=w, method='loess') MA<-normalizeBetweenArrays(MA, method="Aquantile") MA.avg <-avereps(MA, ID=MA$genes$ProbeName) corfit<-duplicateCorrelation(MA.avg, block=biorep) > corfit$consensus [1] -0.812968 As this is an unconnected design, I followed Gordon's advice in another posting and made my own design matrix. > design Dye WTAR WTBR WTCR CoffeeAR CoffeeCr [1,] 1 0 0 -1 0 0 [2,] 1 0 0 0 -1 0 [3,] 1 0 0 0 0 -1 [4,] 1 0 -1 0 0 0 [5,] 1 0 1 0 0 0 [6,] 1 0 0 0 1 0 [7,] 1 0 0 1 0 0 [8,] 1 0 0 0 0 -1 [9,] 1 0 -1 0 0 0 [10,] 1 -1 0 0 0 0 [11,] 1 0 0 0 -1 0 [12,] 1 0 0 0 1 0 [13,] 1 1 0 0 0 0 [14,] 1 0 1 0 0 0 [15,] 1 0 0 0 0 1 [16,] 1 -1 0 0 0 0 [17,] 1 0 0 1 0 0 [18,] 1 0 0 0 0 -1 [19,] 1 1 0 0 0 0 [20,] 1 0 1 0 0 0 [21,] 1 0 -1 0 0 0 [22,] 1 0 0 1 0 0 [23,] 1 0 0 -1 0 0 [24,] 1 1 0 0 0 0 [25,] 1 -1 0 0 0 0 [26,] 1 0 0 0 0 1 [27,] 1 0 0 0 0 -1 [28,] 1 0 0 0 1 0 [29,] 1 0 0 0 -1 0 [30,] 1 0 1 0 0 0 [31,] 1 0 -1 0 0 0 [32,] 1 0 0 1 0 0 [33,] 1 0 0 -1 0 0 [34,] 1 1 0 0 0 0 [35,] 1 -1 0 0 0 0 [36,] 1 0 0 0 0 1 [37,] 1 0 0 0 0 -1 [38,] 1 0 0 0 1 0 [39,] 1 0 0 0 -1 0 fit<- lmFit(MA.avg,design, block=bioreps, cor=corfit$consensus) fit2 <-eBayes(fit) WTAR<- topTable(fit2, coef=2, adjust="BH") Is it sensible to make a coefficent for each of the batches in my design with my set of arrays? So three extra columns? I am unsure if I have enough information in my arrays for this, and I would appreciated your advice/ suggestions. I am especially concerned about how to treat the batch effect as the second batch has some background hybridisation issues from looking at the FE array images. Although they look OK on the QC in limma- just more constricted M values than the other arrays, I am concerned about them. I did remove the whole batch and ran the analysis with the remaining 29 arrays to gauge what effect they were having on the analysis and found that I got even less statistically significant genes. So, my questions are; 1. is the design matrix I constructed OK ? 2. How can I deal with the batch effect in my set off arrays. 3. Any other comments welcome! Thanks for any help you are able to give. Cheers Katrina [[alternative HTML version deleted]]

• 930 views

ADD COMMENT • link 15.1 years ago Katrina bell ▴ 30