Difference between one-stage and two-stage analyses of multiple microarray experiments in limma
1
0
Entering edit mode
mwesthues • 0
@mwesthues-8547
Last seen 9.4 years ago
Germany

I have multiple two-color microarray experiments, analyzed with the same technology, that are  connected through common genotypes. I want to compute best linear unbiased estimates  (BLUEs)/adjusted values for each RNA source across all experiments. I have tried a combined analysis  in limma in a single step and a two-step approach using a linear mixed model package  (lme4 in R). The resulting BLUEs are highly correlated, but their estimated standard deviations of  the errors are only moderately correlated. How can this be explained?

Full description and question

The dataset that I would like to analyze in limma has the following properties:

  • Seven separate experiments, which share a common chip and image-reading technology.
  • All RNA sources across experiments are biological replicates.
  • All RNA sources that occur mutiple times on the same array are technical replicates.
  • The different experiments are connected via some common genotypes, but not via common arrays.
  • The hybridization pairs were selected in a way that ensures that all RNA sources are connected (interwoven loop design).
  • A two-color design was used, where, for each experiment, every genotype was labeled once with "Cy3" and once with "Cy5", respectively.


Our goal is to compute best linear unbiased estimates (BLUE)/adjusted values of the gene  expressions for each RNA source. We are NOT interested in the differences of gene expressions  between the RNA sources as it is usually the case. I see two options to achieve this goal of ours:

1. A one-step approach where all arrays from every experiment are included in the limma analysis. In order to account for the existence of multiple experiments, an "Experiment"-effect is added by augmenting the design matrix by the number of experiments.

    rnasources <- uniqueTargets(targets)
    design <- modelMatrix(targets, ref = rnasources[1])
    fact_mod <- model.matrix(~0 + Experiment, dd)
    design <- cbind(design, fact_mod)

 

, where dd is a data frame that contains the levels of the arrays dependent on their experiment membership.

2. A two-step approach where each experiment is analyzed separately, the BLUEs are extracted    from the individual experiment and enter a linear mixed model with an "Experiment" effect (for example lme4 or asreml).

I have investigated both analyses and there is a very high correlation (r = 0.97) between the  LS-means from the one-stage and the two-stage analysis, respectively. However, the correlation  between the estimated standard deviations of the errors (sigma both in lme4 and limma) is merely 0.7. Now I am wondering about the reason for those differences. Interestingly, the correlation  between the best linear unbiased predictors (BLUP) of the random "Experiment" effect in the  two-stage model and the best linear unbiased estimates (BLUE) of the fixed "Experiment" effect in  the one-stage model is close to zero, which I did not expect in advance. Some assumptions:

  1. The model matrix for the one-stage analysis is incorrect. I have read in other posts that it  is fine to augment the design matrix by adding different levels of a "Batch" effect. I  would consider this case to be comparable to mine, so I do not immediately see a  problem here, but please correct me if I am wrong.
  2. The precision of the sigmas is considerably higher for the one-stage approach compared  to the two-stage approach, since there is simply more information available for the  estimation of the BLUEs.

 

r limma limma design matrix • 1.9k views
ADD COMMENT
0
Entering edit mode

I assume that you're referring to the BLUEs of the coefficients when you're talking about "RNA sources". The best estimate of the observed expression value for each sample would be, well, the observed expression value.

ADD REPLY
0
Entering edit mode

Yes, I am referring to the log-ratio between a given RNA source and my chosen reference for each gene.

ADD REPLY
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 19 hours ago
The city by the bay

I'm not familiar with lme4, but if you're plugging in the estimated coefficients directly from limma into the lme4 functions, then I presume that you're treating the estimates as known observations. This doesn't account for the uncertainty of estimation in the first fit with limma. I suppose that, if such uncertainty was included (possibly as precision weights), this would affect the standard deviations that you get from lme4.

Another reason may be that limma treats Experiment as a fixed effect, i.e., the corresponding coefficients are not considered to originate from some underlying distribution. Treating it as a random effect would likely change the standard deviations of all coefficient estimates, as the certainty of those estimates would vary according to the likelihood of obtaining the estimated Experiment coefficients from the underlying distribution.

ADD COMMENT
0
Entering edit mode

Thanks for your answer! That sounds reasonable.

  1. My assumption is that there is no option to treat such experimental effects in limma as random or am I wrong about that?
  2. Would you change anything about my one-stage approach or does this also seem correct to you?
ADD REPLY
1
Entering edit mode

You could try blocking on Experiment with duplicateCorrelation rather than putting it in the design matrix. Recall that the latter approach does not consider the variability between Experiment coefficients. With the duplicateCorrelation method, this variability is now incorporated into the variance estimates for each gene. This might be closer to a random effects model, as the variability between experiments will be reflected in the estimation of the other coefficients. However, I'm not enough of an expert in this to be sure about it, so take it with a grain of salt.

ADD REPLY

Login before adding your answer.

Traffic: 704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6