Question

Use of arrayWeights with two treatment groups

0

Entering edit mode

Emma • 0

@57c383ff

Last seen 3.0 years ago

United Kingdom

I am analysing some microarray data. For a set of participants, I have their RNA expression before an intervention ("Baseline") and after the intervention ("Followup"). I want to know which genes are differentially expressed between the two timepoints. As suggested in the Limma user guide when working with human in vivo data, I have used the arrayWeights() function to weight the arrays according to their quality. From what I understand, this function performs a regression on all samples and then weights the samples according to their distance from this regression. However, as there are two treatment groups ("Baseline" and "Followup"), I would like to know if it is suitable to use a single regression in this way, as I am expecting there to be differences between the two groups. Does it make sense instead to perform two array weightings - one for the "Baseline" samples and one for the "Followup" samples? If so, how would I go about altering my code to incorporate this?

My current code:

 y <- neqc(x)
design <- model.matrix(~Participant+Time)
arrayw <- arrayWeights(y)
fitw <- lmFit(y,design,weights=arrayw)
fitw <- eBayes(fitw) 
topTable(fitw,coef="TimeFollowup")

Thanks in advance for any help!

Microarray arrays limma MicroarrayData • 2.2k views

ADD COMMENT • link updated 3.8 years ago by Gordon Smyth 52k • written 3.9 years ago by Emma • 0

score 2 · Accepted Answer · 2021-06-09

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 38 minutes ago

WEHI, Melbourne, Australia

Your code is fine. Differential expression does not affect the quality weights.

I'm not sure where you have read about a regression on all the samples, but that is not how the weights are computed.

ADD COMMENT • link 3.9 years ago Gordon Smyth 52k

0

Entering edit mode

Thanks for the answer! Just wondering, how does the function determine array quality?

ADD REPLY • link 3.9 years ago Emma • 0

1

Entering edit mode

There is a description and references in the help page.

Details:

     The relative reliability of each array is estimated by measuring
     how well the expression values for that array follow the linear
     model. Arrays that tend to have larger residuals are assigned
     lower weights.

     The basic method is described by Ritchie et al (2006) and the
     extension to custom variance models by Liu et al (2015). A
     weighted linear model is fitted to the expression values for each
     gene. The variance model is fitted to the squared residuals from
     the linear model fit and is updated either by full REML scoring
     iterations ('method="reml"') or using an efficient gene-by-gene
     update algorithm ('method="genebygene"'). The final estimates of
     these array variances are converted to weights. The gene-by-gene
     algorithm is described by Ritchie et al (2006) while the REML
     algorithm is an adaption of the algorithm of Smyth (2002).

     For stability, the array weights are squeezed slightly towards
     equality. This is done by adding a prior likelihood corresponding
     to unit array weights equivalent to 'prior.n' genes. The
     gene-by-gene algorithm is started from the prior genes while the
     REML algorithm adds the prior to the log-likelihood derivatives.

     By default, 'arrayWeights' chooses between the REML and
     gene-by-gene algorithms automatically ('method="auto"'). REML is
     chosen if there are no prior weights or missing values and
     otherwise the gene-by-gene algorithm is used.

     The input 'object' is interpreted as for 'lmFit' and 'getEAWP'. In
     particular, the arguments 'design' and 'weights' will be extracted
     from the data 'object' if available and do not normally need to be
     set explicitly in the call; if any of these are set in the call
     then they will over-ride the slots or components in the data
     'object'.

Value:

     A numeric vector of array weights, which multiply to 1.

Author(s):

     Matthew Ritchie and Gordon Smyth

References:

     Liu, R., Holik, A. Z., Su, S., Jansz, N., Chen, K., Leong, H. S.,
     Blewitt, M. E., Asselin-Labat, M.-L., Smyth, G. K., Ritchie, M. E.
     (2015). Why weight? Combining voom with estimates of sample
     quality improves power in RNA-seq analyses. _Nucleic Acids
     Research_ 43, e97. <URL:
     http://nar.oxfordjournals.org/content/43/15/e97>

     Ritchie, M. E., Diyagama, D., Neilson, van Laar, R., J., Dobrovic,
     A., Holloway, A., and Smyth, G. K. (2006). Empirical array quality
     weights in the analysis of microarray data. _BMC Bioinformatics_
     *7*, 261. <http://www.biomedcentral.com/1471-2105/7/261>

     Smyth, G. K. (2002). An efficient algorithm for REML in
     heteroscedastic regression. _Journal of Computational and
     Graphical Statistics_ *11*, 836-847. <URL:
     http://www.statsci.org/smyth/pubs/remlalgo.pdf>

ADD REPLY • link 3.9 years ago James W. MacDonald 68k

0

Entering edit mode

Hi, Thanks a lot for your help!

Apologies, I'm still a bit confused -

As you write, "the relative reliability of each array is estimated by measuring how well the expression values for that array follow the linear model", but if I have two different treatment groups then I would expect the expression values to be different between the two groups - by weighting them according to how well they fit a linear model based on both treatment groups combined aren't I essentially adjusting out some of the effect of my treatment?

Thanks again for your help!

ADD REPLY • link 3.8 years ago Emma • 0

0

Entering edit mode

No, quite the opposite. The method clarifies the treatment effect rather than removing it.

ADD REPLY • link 3.8 years ago Gordon Smyth 52k