Hello, I need to compare the performance of a model based on a fully Bayesian approach with the model implemented in the limma package. My data looks like this (protein-wise):
sub.1 healthy tissue.1
sub.1 healthy tissue.2
sub.1 healthy tissue.3
sub.1 diseased tissue.1
sub.1 diseased tissue.2
sub.1 diseased tissue.3
sub.2 healthy tissue.1
sub.2 healthy tissue.2
sub.2 healthy tissue.3
sub.2 diseased tissue.1
sub.2 diseased tissue.2
sub.2 diseased tissue.3
My model likelihood its expressed in this parametrization (where j cycles proteins, l cycles the tissue, s cycles the health status and i cycles the subjects):
y_jlsi = mu_jl + delta_i + theta_jls + epsilon_jlsi
where:
- mu_jl is a mixed effect composed of a fix part caused by the protein and a random effect based on the tissue
- delta_i is a random effect due to the subject
- theta_jls is the effect of interest
- epsilon_jlsi is the random residual
To adapt this model to limma package i reparameterized the model in this way:
y_jlsi = xi_j + gamma_li + theta_jls + epsilon_jlsi
now:
- xi_j it's the fixed effect of mu_jl
- gamma_li it's the sum of the random part of mu_jl and delta_i
From a statistical point of view, these models are equivalent in theta_jls estimation, but now I should be able to estimate the 2nd parametrization with the R code posted below.
I am also willing to search for literature to make sure alone that my assumptions are correct. However, I have difficulty in looking for theoretical papers of the statistical model.
The literature I have found and read is the following:
[1] Ritchie, Matthew E., et al. "limma powers differential expression analyses for RNA-sequencing and microarray studies." Nucleic acids research 43.7 (2015): e47-e47. [2] Smyth, Gordon K., et al. "Linear models for microarray and RNA-Seq data user’s guide." R, doi 10 (2015)
> Array = matrix(data[,"value"], ncol=60, nrow=30, byrow= TRUE)
>
> fix = as.factor(paste(data$state[1:60],data$location[1:60], sep="."))
> fix
[1] healthy.loc.1 healthy.loc.2 healthy.loc.3 diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1
[8] healthy.loc.2 healthy.loc.3 diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 healthy.loc.2
[15] healthy.loc.3 diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 healthy.loc.2 healthy.loc.3
[22] diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 healthy.loc.2 healthy.loc.3 diseased.loc.1
[29] diseased.loc.2 diseased.loc.3 healthy.loc.1 healthy.loc.2 healthy.loc.3 diseased.loc.1 diseased.loc.2
[36] diseased.loc.3 healthy.loc.1 healthy.loc.2 healthy.loc.3 diseased.loc.1 diseased.loc.2 diseased.loc.3
[43] healthy.loc.1 healthy.loc.2 healthy.loc.3 diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1
[50] healthy.loc.2 healthy.loc.3 diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 healthy.loc.2
[57] healthy.loc.3 diseased.loc.1 diseased.loc.2 diseased.loc.3
Levels: diseased.loc.1 diseased.loc.2 diseased.loc.3 healthy.loc.1 healthy.loc.2 healthy.loc.3
> block = paste(data$id[1:60],data$location[1:60], sep=".")
> block
[1] "sub.1.loc.1" "sub.1.loc.2" "sub.1.loc.3" "sub.1.loc.1" "sub.1.loc.2" "sub.1.loc.3" "sub.2.loc.1"
[8] "sub.2.loc.2" "sub.2.loc.3" "sub.2.loc.1" "sub.2.loc.2" "sub.2.loc.3" "sub.3.loc.1" "sub.3.loc.2"
[15] "sub.3.loc.3" "sub.3.loc.1" "sub.3.loc.2" "sub.3.loc.3" "sub.4.loc.1" "sub.4.loc.2" "sub.4.loc.3"
[22] "sub.4.loc.1" "sub.4.loc.2" "sub.4.loc.3" "sub.5.loc.1" "sub.5.loc.2" "sub.5.loc.3" "sub.5.loc.1"
[29] "sub.5.loc.2" "sub.5.loc.3" "sub.6.loc.1" "sub.6.loc.2" "sub.6.loc.3" "sub.6.loc.1" "sub.6.loc.2"
[36] "sub.6.loc.3" "sub.7.loc.1" "sub.7.loc.2" "sub.7.loc.3" "sub.7.loc.1" "sub.7.loc.2" "sub.7.loc.3"
[43] "sub.8.loc.1" "sub.8.loc.2" "sub.8.loc.3" "sub.8.loc.1" "sub.8.loc.2" "sub.8.loc.3" "sub.9.loc.1"
[50] "sub.9.loc.2" "sub.9.loc.3" "sub.9.loc.1" "sub.9.loc.2" "sub.9.loc.3" "sub.10.loc.1" "sub.10.loc.2"
[57] "sub.10.loc.3" "sub.10.loc.1" "sub.10.loc.2" "sub.10.loc.3"
>
> design = model.matrix(~0+fix)
>
>
> rand = duplicateCorrelation(object = Array,
+ design = design,
+ block = block)
>
> fit <- lmFit(Array, design, correlation = rand$consensus.correlation)
>
> cont.dif <- makeContrasts(loc.1 = (fixdiseased.loc.1-fixhealthy.loc.1),
+ loc.2 = (fixdiseased.loc.2-fixhealthy.loc.2),
+ loc.3 = (fixdiseased.loc.3-fixhealthy.loc.3),
+ levels = design)
>
> fit2 <- contrasts.fit(fit, cont.dif)
> fit3 <- eBayes(fit2)
Thank you very much for your answer,
I proceed by section:
Thanks for the advice! we reflect on the structure of the both models!
This is partially true (if I'm correct). My idea is that (under normality assumption) of the effects, we can decompose the mixed effect mu_jl distributed as N (xi_j, tau ^ 2) in its fixed part, and in its random part now written as zeta_l, distributed as N (0, tau ^ 2). now (mainly to adapt it to the functions of the limma package) I have combined the effect in delta_i ( N(0,lamnda^2)) with the effect in zeta_l creating that gamma_li (0, lamda^2+ tau^2) and the model posted above. Clearly, the variances will no longer be identifiable but these two models should be comparable on the estimates of the parameters of interest theta_jls and in particular each difference between theta_jl1 and theta_jl2 or the increase in the concentration of the protein in the inflamed tissue.
If my R code is wrong (the performance on simulations seems to work well) how do I represent one of these statistical models with limma? My main problem with all is how to express 2 different random effects in the model.
here I found a new paper, I am continuing my study!
thanks again for your help
limma only fits one random effect term and there isn't any way to make it do otherwise. Estimating a random effect for an combined blocking variable is not equivalent to estimating the sum of the individual variance components, especially when one of the variance components isn't identifiable anyway.
Thank you!