I am analysing some microarray data. For a set of participants, I have their RNA expression before an intervention ("Baseline") and after the intervention ("Followup"). I want to know which genes are differentially expressed between the two timepoints. As suggested in the Limma user guide when working with human in vivo data, I have used the arrayWeights() function to weight the arrays according to their quality. From what I understand, this function performs a regression on all samples and then weights the samples according to their distance from this regression. However, as there are two treatment groups ("Baseline" and "Followup"), I would like to know if it is suitable to use a single regression in this way, as I am expecting there to be differences between the two groups. Does it make sense instead to perform two array weightings - one for the "Baseline" samples and one for the "Followup" samples? If so, how would I go about altering my code to incorporate this?
My current code:
y <- neqc(x)
design <- model.matrix(~Participant+Time)
arrayw <- arrayWeights(y)
fitw <- lmFit(y,design,weights=arrayw)
fitw <- eBayes(fitw)
topTable(fitw,coef="TimeFollowup")
Thanks in advance for any help!
Thanks for the answer! Just wondering, how does the function determine array quality?
There is a description and references in the help page.
Hi, Thanks a lot for your help!
Apologies, I'm still a bit confused -
As you write, "the relative reliability of each array is estimated by measuring how well the expression values for that array follow the linear model", but if I have two different treatment groups then I would expect the expression values to be different between the two groups - by weighting them according to how well they fit a linear model based on both treatment groups combined aren't I essentially adjusting out some of the effect of my treatment?
Thanks again for your help!
No, quite the opposite. The method clarifies the treatment effect rather than removing it.