Entering edit mode
Ryan M Ghan
▴
20
@ryan-m-ghan-6448
Last seen 10.4 years ago
Hello limma community-
I am attempting to construct a contrast matrix that I can run in R,
using the limma bioconductor package, but I am not sure that I have
coded the contrast matrix correctly. A previous
post<https: stats.stackexchange.com="" questions="" 64249="" creating-="" contrast-matrix-for-linear-regression-="" in-r?newreg="add2674ca9d04b7eb85fad255b45b7f5"> on
stats.stackexchange.com, the bioconductor mailing list, and the limma
guide were helpful, but my two factorial design is more complicated
than what is illustrated there.
The first factor is the treatment, with two levels (control=c and
stress=s), and the second factor is the genotype, with five levels
(g1, g2, g3, g4, g5). Each genotype/treatment consists of 3-biological
replicates (30xsamples total). My dataset has already been normalized
and log2 transformed (Should I even start with log transformed data?).
It consists of 1208 proteins (based upon spectral counting for those
that care) that measures protein abundance differences in the five
genotypes and two treatments. The dataset is complete, meaning each
sample/condition has a datapoint, and appears to be normally
distributed (histogram and qqplots).
## Session information
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] limma_3.18.13
loaded via a namespace (and not attached):
[1] colorspace_1.2-4 dichromat_2.0-0 digest_0.6.4
ggplot2_0.9.3.1 grid_3.0.2 gtable_0.1.2
[7] labeling_0.2 MASS_7.3-30 munsell_0.4.2
plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5
[13] Rcpp_0.11.0 reshape2_1.2.2 scales_0.2.3
stringr_0.6.2 tools_3.0.2
## Subset of the data:
proteinID g1.s1 g1.s2 g1.s3 g1.c1 g1.c2 g1.c3 g2.s1 g2.s2 g2.s3
g2.c1 g2.c2 g2.c3 g3.s1 g3.s2 g3.s3 g3.c1 g3.c2 g3.c3 g4.s1 g4.s2
g4.s3 g4.c1 g4.c2 g4.c3 g5.s1 g5.s2 g5.s3 g5.c1 g5.c2 g5.c3
prot1 -9.70583694 -9.940059478 -9.764489183 -9.691937821
-9.547306096 -9.668928704 -9.821333234 -10.00376839 -9.843380585
-10.0789111 -9.958506961 -9.791583706 -10.04996359 -10.10279896
-10.0689715 -9.989303332 -10.05414639 -10.00619809 -9.907032795
-10.09700113 -10.00902876 -10.05603575 -10.26218387 -10.15527373
-9.88009858 -9.748974338 -9.730010667 -9.899956956 -9.773955101
-9.957684691
prot2 -9.810354967 -9.844319231 -9.896748977 -9.777040294
-9.821308434 -9.906798728 -9.832236541 -9.876359355 -9.935535795
-10.05991278 -9.831098077 -9.789738587 -10.08470861 -10.18515166
-10.10371621 -10.01971224 -9.977142493 -10.09055782 -9.739831978
-9.586647999 -9.949407778 -9.800183583 -9.83900565 -9.943521592
-9.99229056 -9.744850134 -9.794814509 -9.98542989 -9.766324886
-9.95430439
prot3 -11.70842601 -11.72521838 -11.90389475 -11.98273998
-11.915401 -11.88620205 -11.91603643 -11.96029519 -12.14926486
-12.23846499 -12.26650985 -11.84300821 -12.64562082 -12.41471031
-12.66462278 -12.577619 -12.90001898 -12.31577711 -11.66323243
-11.50283992 -11.4844068 -11.60402491 -11.95270942 -11.68245512
-12.32380181 -12.24294758 -12.23990879 -12.21563403 -12.33730369
-12.437377
prot4 -10.88942769 -11.16906693 -11.13942576 -11.31332257
-11.04718433 -11.11811122 -11.17687812 -11.12503828 -10.9724186
-11.16837945 -11.19642214 -10.96468249 -11.3975887 -11.28808753
-11.32778647 -11.34124725 -11.30972182 -11.29564372 -10.74370929
-10.92223539 -10.97733154 -11.40528844 -11.1238659 -11.15938598
-11.24937805 -10.8691392 -11.12478375 -10.75566728 -10.99485703
-11.09493115
prot5 -10.0102959 -9.936796529 -9.964629149 -9.842835973
-9.791578592 -9.773380518 -9.72290866 -9.715837804 -9.79028651
-9.951486129 -9.636225505 -9.820715987 -10.41899204 -10.25269382
-10.26949484 -10.02644184 -10.13120897 -10.20756299 -9.752087376
-9.687001368 -10.07111473 -9.815279198 -9.995624174 -9.993526894
-9.722360141 -9.551502595 -9.551929198 -9.724500546 -9.502769792
-9.65324573
prot6 -10.34051005 -10.27571947 -10.14968761 -10.17419023
-10.47812301 -10.11019796 -10.40447672 -10.15885481 -10.22900798
-10.26612428 -10.21920493 -10.17186677 -10.66125689 -10.95438025
-10.63751536 -10.65825783 -10.60857688 -10.78516027 -10.33890785
-10.49726978 -10.47100414 -10.64742463 -10.78932619 -10.5318634
-10.26494688 -9.975182247 -10.24870036 -10.2356165 -10.26689552
-10.13061368
prot7 -10.24930429 -10.37307132 -10.03573128 -10.29985129
-9.991216794 -10.05854902 -10.1958704 -10.30549818 -10.2078462
-10.28795766 -10.23314344 -10.23897922 -9.997472306 -10.27461285
-10.20805608 -10.06261332 -10.24876706 -10.12643737 -9.906088449
-10.07316322 -10.23545822 -10.30970717 -10.40745591 -10.36432166
-10.22423532 -10.25703553 -10.44925268 -9.902554721 -9.891163766
-10.0695915
prot8 -10.98782595 -10.84184533 -10.76496107 -10.68290092
-10.55763113 -10.91736394 -10.87505278 -10.76474268 -10.58319007
-10.87547281 -10.71948079 -10.95011831 -10.99753277 -11.061728
-10.8852958 -10.86371208 -10.96638746 -11.24112703 -10.46809937
-10.78446288 -10.71240489 -10.80931259 -10.6598091 -10.54801115
-10.70612733 -10.7339808 -10.8184854 -10.53370359 -10.47323989
-10.62675183
prot9 -8.83857166 -8.736344638 -8.743339515 -8.8152675
-8.743086044 -8.719612156 -8.898093257 -8.902781886 -9.071574958
-8.945970659 -8.862394746 -8.825061244 -8.82313363 -9.161452294
-8.905846232 -8.940119002 -9.024995852 -8.943721201 -8.768488159
-8.802155458 -8.721187011 -8.84850416 -8.931513624 -8.86743278
-8.856904592 -8.675257846 -8.900833162 -8.676117406 -8.758661701
-8.925717389
prot10 -10.65297508 -10.74532307 -10.65940071 -10.36671791
-10.50431649 -10.54915637 -11.07154003 -10.79884265 -10.97164196
-11.1201714 -11.14821342 -10.9254445 -10.92875918 -10.90806369
-10.77581175 -11.2324716 -11.31360896 -11.01070959 -11.04450945
-10.89694291 -10.76865867 -10.92983387 -11.07365287 -11.43888216
-11.14948441 -10.69611194 -10.85827316 -10.64470128 -10.79046792
-10.86048168
## Code that I am attempting to utilize:
proteins.mat <- as.matrix(proteins.df)
treat = c("g1.s","g1.c","g2.s","g2.c","g3.s","g3.c","g4.s","g4.c",
"g5.s","g5.c")
factors = gl(10,3,labels=treat)
design <- model.matrix(~0+factors)
colnames(design) <- treat
## Here is the design for my model:
> design
g1.s g1.c g2.s g2.c g3.s g3.c g4.s g4.c g5.s g5.c
1 1 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0
4 0 1 0 0 0 0 0 0 0 0
5 0 1 0 0 0 0 0 0 0 0
6 0 1 0 0 0 0 0 0 0 0
7 0 0 1 0 0 0 0 0 0 0
8 0 0 1 0 0 0 0 0 0 0
9 0 0 1 0 0 0 0 0 0 0
10 0 0 0 1 0 0 0 0 0 0
11 0 0 0 1 0 0 0 0 0 0
12 0 0 0 1 0 0 0 0 0 0
13 0 0 0 0 1 0 0 0 0 0
14 0 0 0 0 1 0 0 0 0 0
15 0 0 0 0 1 0 0 0 0 0
16 0 0 0 0 0 1 0 0 0 0
17 0 0 0 0 0 1 0 0 0 0
18 0 0 0 0 0 1 0 0 0 0
19 0 0 0 0 0 0 1 0 0 0
20 0 0 0 0 0 0 1 0 0 0
21 0 0 0 0 0 0 1 0 0 0
22 0 0 0 0 0 0 0 1 0 0
23 0 0 0 0 0 0 0 1 0 0
24 0 0 0 0 0 0 0 1 0 0
25 0 0 0 0 0 0 0 0 1 0
26 0 0 0 0 0 0 0 0 1 0
27 0 0 0 0 0 0 0 0 1 0
28 0 0 0 0 0 0 0 0 0 1
29 0 0 0 0 0 0 0 0 0 1
30 0 0 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 1 1 1 1 1 1 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$factors
[1] "contr.treatment"
## My contrast model. I want to test for interaction, differences
between genotypes, and to see if specific genotypes respond
differently to the treatment from one another:
cmtx <- makeContrasts(
GenotypevsTreatment=(g1.s-g1.c)-(g2.s-g2.c)-(g3.s-g3.c)-(g4.s-g4
.c)-(g5.s-g5.c),
genotype=(g1.s+g1.c)-(g2.s+g2.c)-(g3.s+g3.c)-(g4.s+g4.c)-(g5.s+g5.c),
Treatment=(g1.s+g2.s+g3.s+g4.s+g5.s)-(g1.c+g2.c+g3.c+g4.c+g5.c),
levels=design)
## What my contrast model looks like, but I don't think this is
correct:
> cmtx
Contrasts
Levels GenotypevsTreatment Genotype Treatment
g1.s 1 1 1
g1.c -1 1 -1
g2.s -1 -1 1
g2.c 1 -1 -1
g3.s -1 -1 1
g3.c 1 -1 -1
g4.s -1 -1 1
g4.c 1 -1 -1
g5.s -1 -1 1
g5.c 1 -1 -1
## Fitting the linear model by empirical bayes statistics for
differential expression:
fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design), cmtx))
topTable(fit, adjust.method="BH")
## The below topTable proteins are the same as the subset of data from
above:
> topTable(fit, adjust.method="BH")
GenotypevsTreatment Genotype Treatment AveExpr
F P.Value adj.P.Val
prot1 -0.40786338 60.30918 0.073054723 -9.918822 17308.55
1.124646e-39 1.232079e-36
prot2 -0.09255219 59.60864 0.061701713 -9.897968 15801.43
3.304533e-39 1.232079e-36
prot3 -0.23880357 73.48557 0.536672827 -12.090016 15650.65
3.701463e-39 1.232079e-36
prot4 -0.11834000 66.76931 0.305471823 -11.122034 15522.46
4.079731e-39 1.232079e-36
prot5 -0.15210172 59.21509 -0.183849274 -9.876144 14734.51
7.556112e-39 1.423908e-36
prot6 -0.15761118 62.87467 0.155340561 -10.389362 14565.87
8.658504e-39 1.423908e-36
prot7 -0.03886438 61.15652 -0.166795475 -10.182834 14551.88
8.757515e-39 1.423908e-36
prot8 -0.10425341 64.63523 -0.186904167 -10.780359 14461.18
9.429854e-39 1.423908e-36
prot9 -0.03426380 53.48057 0.007403722 -8.854471 13713.49
1.767090e-38 2.021378e-36
prot10 -0.75250251 66.62646 0.327497120 -10.894506 13480.51
2.164184e-38 2.021378e-36
I think I am naively constructing the contrast matrix. The result for
Genotype (above) looks incorrect to me. Ideally, I would like to
figure this out because I wish to apply similar modeling techniques to
an RNAseq dataset from the same experimental samples that I used for
my protein work.
Also, in order to make the correct comparisons, do I also need to
employ the duplicateCorrelation function (pg. 50, lima guide 9.7
Multi-level Experiments)? Something like this:
corfit <- duplicateCorrelation(proteins.mat, design)
> corfit$consensus
[1] -0.7888584
fit <- eBayes(contrasts.fit(lmFit(proteins.mat, design,
corrleation=corfit$consensus), cmtx))
Any insight/corrections would be greatly appreciated, and if I have
forgotten some information that would aid in helping me with this
problem I am happy to cooperate.
Cheers,
Ryan
Ryan M. Ghan
Graduate Research Assistant
Department of Biochemistry and Molecular Biology, MS 330
University of Nevada, Reno
Reno, NV 89557
Howard 205
Lab: (775) 784-4225
rghan@unr.edu
[[alternative HTML version deleted]]