Entering edit mode
Aaron Mackey
▴
170
@aaron-mackey-4358
Last seen 10.2 years ago
My collaborators have an experimental design in which cells are
treated
experimentally with two conditions, and they naturally wish to know
the
differences in response between the two. Moreover, the experiments
are
setup in pairs of treatments, with each pair produced from the same
"batch"
of cells, inducing a natural pairing that we might want to include in
the
limma design. We would do this to take advantage of expected
correlations
in gene expression due to the source of cells in each experiments.
However, when we run the analyses with either a paired or unpaired
design,
we find that the unpaired statistics are far more significant (~1000
probesets at FDR < 5%) than with the paired design (~100), which
implies
that there is not enough correlation across pairs, at least relative
to the
induced treatment effects. A bit stumped at first, I finally
confirmed
for myself that even in the presence of strong correlation, a larger
treatment effect will remain more significant with an unpaired design:
> wt <- c(0.9, 1.0, 1.2)
> mean(wt)
[1] 1.033333
> mu <- c(6.2, 6.1, 5.9)
> mean(mu)
[1] 6.066667
> mean(mu) - mean(wt)
[1] 5.033333
> mean(mu-wt)
[1] 5.033333
In fact, no matter how you pair up mu and wt, you will always get
5.0333 as
the paired fold change. However, the variance may change, depending
on how
correlated mu and wt are (it is this correlation that we are trying to
take
advantage of by pairing):
> cor(wt, mu)
[1] -1
> sd(mu-wt)
[1] 0.305505
> mu2 <- sort(mu)
> wt2 <- sort(wt)
> cor(wt2, mu2)
[1] 0.9285714
> sd(mu2-wt2)
[1] 0.05773503
Now let's see how this affects t-test significance:
> t.test(*mu, wt, paired=F*, var.equal=T)
Two Sample t-test
data: mu and wt
*t = 40.3564, df = 4, p-value = 2.253e-06*
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.687050 5.379617
sample estimates:
mean of x mean of y
6.066667 1.033333
> t.test(*mu, wt, paired=T*, var.equal=T)
Paired t-test
data: mu and wt
*t = 28.5363, df = 2, p-value = 0.001226*
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.274417 5.792250
sample estimates:
mean of the differences
5.033333
> t.test(*mu2, wt2, paired=F*, var.equal=T)
Two Sample t-test
data: mu2 and wt2
*t = 40.3564, df = 4, p-value = 2.253e-06*
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.687050 5.379617
sample estimates:
mean of x mean of y
6.066667 1.033333
> t.test(*mu2, wt2, paired=T*, var.equal=T)
Paired t-test
data: mu2 and wt2
*t = 151, df = 2, p-value = 4.385e-05*
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
4.889912 5.176755
sample estimates:
mean of the differences
5.033333
In the first case, when wt & mu were anti-correlated, the unpaired
t-test
gave much better P values; adding the pairing info made the variation
in
mu-wt larger, and so the P value got worse (t statistic was smaller;
also
the smaller df in the paired test will, for the same t statistics,
deflate
the P value).
In the second case, when mt & mu were strongly correlated, the paired
t-test was still very good, and had a much higher t statistics than
the
unpaired test, but the P value was still not quite as good as the
unpaired
-- this is due to the drop in df. Much of this has to do with the
large
difference between to the two groups; if I make the difference between
mu
and wt a bit smaller, without changing the correlation structure:
> *mu3 <- mu2 - 4.5 # cor(wt2, mu3) == cor(wt2, mu2)*
> t.test(*mu3, wt2, paired=F*, var.equal=T)
Two Sample t-test
data: mu3 and wt2
*t = 4.2762, df = 4, p-value = 0.01289*
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1870498 0.8796169
sample estimates:
mean of x mean of y
1.566667 1.033333
> t.test(*mu3, wt2, paired=T*, var.equal=T)
Paired t-test
data: mu3 and wt2
*t = 16, df = 2, p-value = 0.003884*
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.3899116 0.6767551
sample estimates:
mean of the differences
0.5333333
Then, finally, you see a change in the expected direction: the paired
test
is more significant than the unpaired test.
So, the question is -- how might you convince yourself (or a savvy and
skeptic reviewer for that matter) that your deliberate removal of
pairing
from your design is the statistically valid approach? My own thoughts
were
to show a distribution of observed correlations across pairings, to
demonstrate that the within-pairings variances were much smaller than
the
between-treatment variances of interest.
Thanks for your time and attention,
-Aaron
--
Aaron J. Mackey, PhD
Assistant Professor
Center for Public Health Genomics
University of Virginia
amackey@virginia.edu
http://www.cphg.virginia.edu/mackey
[[alternative HTML version deleted]]