Entering edit mode
Ramon Diaz
★
1.1k
@ramon-diaz-159
Last seen 10.2 years ago
Dear Gordon, Naomi, and BioC list,
The issue of how to deal with technical replicates (such as those
obtained
when we do dye-swaps of the same biological samples in cDNA arrays)
has come
up in the BioC list several times. What follows is a short summary,
with
links to mails in BioC plus some questions/comments.
There seem to be three major ways of approaching the issue:
1. Treat the technical replicates as ordinary replicates
*************************************************************
E.g., Gordon Smyth in sept. 2003
(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-September/0
02405.html)
However, this makes me (and Naomi Altman ---e.g.,
https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-December/003
340.html)
uneasy (tech. reps. are not independent biological reps. which leads
to the
usual inflation of dfs and deflation of se).
I guess part of the key to Gordon's suggestion is his comment that
even if the
s.e. are slightly underestimated, the ordering is close to being the
optimal
one. But I don't see why the ordering out to be much worse if we use
the
means of technical replicates as in 3. below. (Haven't done the math,
but it
seems that, specially in the pressence of strong tech. reps.
covariance and
small number of independent samples we ought to be better of using the
means
of the tech. reps).
2. Mixed effects models with subject as random effect (e.g., via lme).
**********************************************************************
********
In late August of 2003 I asked about these issues, and Gordon seemed
to agree
that trying the lme approach could be a way to go.
(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-August/0022
24.html).
However, in September, I posted an aswer and included code that, at
least for
some cases, shows potential problems with using lme when the number of
technical replicates is small.
(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-September/0
02430.html)
Nevertheless, Naomi Altman reports using lme/mixed models in a couple
of
emails
(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-December/00
3191.html;
(https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-January/003
481.html).
After reading about randomizedBlock (package statmod) in a message in
BioC (I
think from Gordon), I have tried aggain the mixed models approach,
since with
tech. reps and no other random effects, we should be able to use
randomizedBlock. Details in 5. below:
3. Take the average of the technical replicates
****************************************************
Except for being possibly conservative (and not estimating tech. reps.
variance component), I think this is a "safe" procedure.
This is what I have ended up doing routinely after my disappointing
tries with
lme
(https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-September/0
02430.html)
and what Bill Kenworthy seemed to end up doing
(https://www.stat.math.ethz.ch/pipermail/bioconductor/2004-January/003
493.html).
I think this is also what is done at least some times in literature
(e.g.,
Huber et al., 2002, Bioinformatics, 18, S96--S104 [the vsn paper]).
*********
4. Dealing with replicates in future versions of limma
***********************************************************
Now, in Sept. 2004 Gordon mentioned that an explicit treatment of
tech. reps.
would be in a future version of limma
( https://www.stat.math.ethz.ch/pipermail/bioconductor/2003-September/
002411.html)
and I wonder if Gordon meant via mixed-effects models, or some other
way, or
if there has been some progress in this area.
5. Using randomizedBlock
*****************************
In a simple set up of control and treatment with dye-swaps, I have
done some
numerical comparisons of the outcome of a t-test on the mean of the
technical
replicates with lme and with randomizedBlock. [The function is
attached]. The
outcome of the t-test of the means of replicates and randomizedBlock,
in
terms of the t-statistic, tends to be the same (if we "positivize" the
dye
swaps). The only difference, then, lies in the df we then use to put a
p-value on the statistic. But I don't see how we can use the dfs from
randomizedBlock: they seem way too large. Where am I getting lost?
Best,
R.
--
Ram?n D?az-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncol?gicas (CNIO)
(Spanish National Cancer Center)
Melchor Fern?ndez Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900
http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
-------------- next part --------------
library(nlme)
library(limma)
library(statmod)
options(contrasts = c("contr.helmert", "contr.poly"))
f3 <- function(effect.size = 0,
s.biol = 2, s.tech = 1,
biological.repls = 5, n.tech.reps = 10){
## For one-sample comparisons, with dye-swaps
## generates data, and fits models using both limma
## and lme and one-sample t-tests on the mean of the
## technical replicates
if(getOption("contrasts")[1] != "contr.helmert")
stop("You need to set Helmert contrasts in options")
if(n.tech.reps %% 2) stop("Need even number of n.tech.reps")
## simulate data
tech.rep.effect <- rnorm(biological.repls, mean = 0, sd = s.biol)
tech.rep.effect <- rep(tech.rep.effect,
rep(n.tech.reps, biological.repls))
data.l <- data <-
rnorm(biological.repls * n.tech.reps, mean = 0, sd = s.tech) +
tech.rep.effect + effect.size
dim(data.l) <- c(1, biological.repls * n.tech.reps)
## design matrices, etc.
tech.reps <-
factor(rep(1:biological.repls, rep(n.tech.reps,
biological.repls)))
dm1 <- model.matrix(data ~ tech.reps)
dye.swap.codes <- rep(c(-1, 1), (n.tech.reps *
biological.repls)/2)
data.sp <- data.l * dye.swap.codes
data.s <- data * dye.swap.codes
## model fitting
tmp1 <- lm.series(data.l, dm1)
tmp2 <- lm.series(data.sp, dm1 * dye.swap.codes)
tmp3 <- summary(lme(data ~ 1, random = ~ 1|tech.reps,
control = list(tolerance = 1e-10, msTol = 1e-10)))
tmp4 <- summary(lme(data.s ~ -1 + dye.swap.codes,
random = ~ 1|tech.reps,
control = list(tolerance = 1e-10, msTol =
1e-10)))
tmp5 <- randomizedBlock(data ~ 1, random = tech.reps,
tol = 1e-10)
tmp6 <- randomizedBlock(data.s ~ -1 + dye.swap.codes,
random = tech.reps,
tol = 1e-10)
tmp1 <- c(tmp1$coefficients[[1]], tmp1$stdev.unscaled[[1]],
tmp1$sigma, tmp1$sigma *tmp1$stdev.unscaled[[1]],
tmp1$df)
names(tmp1) <- c("coeff", "unscaled.stdev", "sigma", "scaled se",
"df")
tmp2 <- c(tmp2$coefficients[[1]], tmp2$stdev.unscaled[[1]],
tmp2$sigma, tmp2$sigma *tmp2$stdev.unscaled[[1]],
tmp2$df)
rB <- rbind(c(tmp5$coefficients[[1]], tmp5$se.coefficients,
tmp5$df.residual, sqrt(tmp5$varcomp)),
c(tmp6$coefficients[[1]], tmp6$se.coefficients,
tmp6$df.residual, sqrt(tmp6$se.varcomp)))
rB <- cbind(rB, rB[, 1]/rB[, 2])
colnames(rB)[c(1:3, 6)] <- c("coeff", "se.coeff", "df", "t-value")
rownames(rB) <- c("rB.positivized", "rB.swap.design")
cat("\n\n *************\n\n")
mean.by.tech.rep <- tapply(data, tech.reps, mean)
the.t.results <- unlist(t.test(mean.by.tech.rep)[1:6])
the.t.se <- sqrt(var(mean.by.tech.rep)/length(unique(tech.reps)))
lme.and.t <- rbind(lme.positivized.t.table = tmp3$tTable,
lme.swap.design.t.table = tmp4$tTable,
t.test.results = c(the.t.results[6], the.t.se,
the.t.results[c(2, 1, 3)]))
rownames(lme.and.t) <- c("lme.positivized", "lme.swap", "t.test")
return(list(
lme.and.limma = rbind(
lm.series.positivized = tmp1,
lm.series.swap.design = tmp2,
lme.positivized = c(tmp3$coefficients[[1]],
sqrt(tmp3$varFix)/tmp3$sigma, tmp3$sigma,
sqrt(tmp3$varFix),
tmp3$tTable[[3]]),
lme.swap.design = c(tmp4$coefficients[[1]],
sqrt(tmp4$varFix)/tmp4$sigma, tmp4$sigma,
sqrt(tmp4$varFix),
tmp4$tTable[[3]])),
lme.and.t,
randomizedBlockaAnalyses = rB))
}
## example call:
f3(2, 2, 1, 3, 2)
f3(0, 2, 1, 3, 2)
f3(0, 2, 4, 3, 2)
f3(0, 12, 4, 3, 2)
f3(2, 12, 4, 3, 2)
f3(10, 12, 4, 3, 2)