Dear all,
I'm analyzing a large number of microarray treatment experiments. I have modeled the expression level as a dependent variable of the continuous independent variables time and dose as well as their interaction. As an R formula:
~ time + dose + time:dose
Now I would like to estimate the overall effect of the treatment experiment on a gene. A reasonable measure would be the amount of variance explained, i.e., R².
I have seen Standard error and effect size from Limma and Effect size similar to Cohen's d in limma but they deal with simple treatment/control experiments where the group means make up the log fold change and it is easy to compute Cohen's d. In my case I have three coefficients plus the intercept to take into account.
I have found this answer on stats.stackexchange that shows how to convert the F statistic to the R². I'm a bit hesitant, however, because limma uses moderated statistics and I'm not sure about their effect and maybe there is a better way altogether in order to estimate the effect size of the complete model.
Thank you for any insights.
What do you mean by "the overall effect of the treatment experiment"? Do you mean the effect of the
dose
factor? If you're getting an F-statistic, you must be performing some kind of DE comparison; what's your code?My exact code is at work but it goes something like this: I have a data frame that describes the expression levels. The experiments were measured at 3 time points after application of the drug and 4 different dosage levels. So you could construct it in the following way:
The output is then:
This is linked to the expression levels which have been mapped to ensembl gene IDs. For each of those unique combinations in the table above I have 3 replicates, so there are actually 36 expression columns.
The questions I want to answer are: (1) Does a gene respond to the treatment at all (at any dosage level)? (2) I want to characterize the strength of the response (hence the effect size) so that I can compare gene response between different drug treatment experiments. I want a single numerical indicator of a gene's response per group of experiments, which is why I used a model that includes time and dose as continuous variables over all points rather than a factorial design.
A different approach would be to use a factorial design and then find a way to summarize the effect size of each contrast such that I obtain one effect size per drug treatment experiment group.
I'm not sure it makes any sense to have an interaction term for two real-valued covariates. The entry in the design matrix ends up being the product of time and dose for each sample, which has no obvious interpretation. I'd go with a factorial design, you've got enough residual d.f. for it. Set up your contrast matrix to test for any DE between dosages at each time point, and do this for all time points. Alternatively, you could test for differential effects of time between dosages, e.g., if the amount of DE for dose 0 between times 0 and 8 is different that for dose 16. In any case, the F-statistic that you get out of that will represent the (time-matched) effect of treatment.
What would your recommendation then be to summarize that information? Simply record the maximum value of the F-statistic over all contrasts?
Well, you can get a single F-statistic by combining all contrasts into a single matrix.