I'm trying to use arrayWeights for the experimental design described below which features technical replicates and the duplicateCorrelation function. I would greatly appreciate it if someone could confirm that what i'm doing is sensible.
I want to determine differential expression for mutant v control cell lines. The study features 5 cell lines (2 control and 3 mutant) with 4 'technical' (cell culture) replicates per line, as shown:
sample reps groups
Line1Rep1 1 Control
Line1Rep2 1 Control
Line1Rep3 1 Control
Line1Rep4 1 Control
Line2Rep1 2 Control
Line2Rep2 2 Control
Line2Rep3 2 Control
Line2Rep4 2 Control
Line3Rep1 3 Mutant
Line3Rep2 3 Mutant
Line3Rep3 3 Mutant
Line3Rep4 3 Mutant
Line4Rep1 4 Mutant
Line4Rep2 4 Mutant
Line4Rep3 4 Mutant
Line4Rep4 4 Mutant
Line5Rep1 5 Mutant
Line5Rep2 5 Mutant
Line5Rep3 5 Mutant
Line5Rep4 5 Mutant
I created a matrix as follows:
groups <- factor(rep(c("Control","Mutant"), times = c(8,12))) reps <- factor(rep(c(1,2,3,4,5), each=4)) design <- model.matrix(~0+groups)
My normalized data are called res.filt2
colnames(design) <- levels(groups) aw <- arrayWeights(res.filt2,design) corfit <- duplicateCorrelation(res.filt2, design, block = reps) fit <- lmFit(res.filt2, design, block = reps, cor = corfit$consensus, weights=aw) contrasts <- makeContrasts(Mutant-Control, levels=design) contr.fit <- eBayes(contrasts.fit(fit,contrasts))
If I understand well, the code above calculates arrayWeights, taking into account both the reps and the groups. What I'm curious about is how does lmFit then use these arrayWeights in combination with the blocks of technical replicates? Is it correct to combine array weights along with the block parameter in lmFit in the way that I've done here?
Also, I notice that the duplicateCorrelation function has a "weights" parameter but I have not used it because (if I've understood correctly) it is for individual spot weights and not for array weights.
Many thanks for any help.
Dear Ryan, thanks for your advice.
I've repeated the analysis, this time using:
The total number of differentially expressed genes is now more than doubled, but includes all of the DE genes identified in the previous analysis. Passing the weights to duplicateCorrelation seemed to preserve approximately the same ranking (when the probes are ranked by p-value) but makes the p-values a bit stronger.