ANOVA-like vs direct contrast
2
0
Entering edit mode
fcamus • 0
@fcamus-15207
Last seen 3.0 years ago
United Kingdom

Hi everyone, I have a general question on how to best analyse data. Background: I have a RNASeq data with two different treatments in a full factorial design: Diet (A or B) and Drug (C or D).

I analysed this dataset in two different ways. The first way is an ANOVA-like style, where I got all the DE genes for each main treatment, plus the interaction.

The second way I analysed this data is using specific contrasts... so changes in diet responses in each drug treatment OR changes in drug response in each diet.

The main thing that I noticed was that there are not many genes that showed a significant interaction between diet*drug (16 genes). However when I look at the specific contrasts, they certainly differ in the number of genes being differentially expressed. For instance, when I look at genes that respond to diet, group C = 80 genes, whereas group D = 900 genes. This would imply that there should be many genes interacting?

This has me thinking about the best way to analyse this kind of data, and from scrolling though forums there are mixed views. I have heard that its much better just to look at single contrasts, but also heard that these ANOVA-like approaches are OK? Any advice would me much appreciated.

edgeR • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

Interactions tend to be less powerful than individual tests, so it's not that surprising that you would have genes that look like they should have a significant interaction but don't. Anyway, the general advice you will get around these parts is to fit a cell means model, where each cell is the combination of diet and drug, and then make whatever contrasts you like, rather than using an ANOVA-style analysis. In other words, something like

> library(limma)
> d.f <- data.frame(Diet = factor(rep(c("A","B"), each = 4)), Drug = factor(rep(c("C","D"), each = 2, times = 2)))
> combo <- apply(d.f, 1, paste, collapse = "_")

> design <- model.matrix(~0 + combo)
> colnames(design) <- gsub("combo","", colnames(design))

> cont <- makeContrasts(A_C - A_D, B_C - B_D, A_C - B_C, A_D - B_C, A_C - A_D + B_C - B_D, levels = design)
> design
  A_C A_D B_C B_D
1   1   0   0   0
2   1   0   0   0
3   0   1   0   0
4   0   1   0   0
5   0   0   1   0
6   0   0   1   0
7   0   0   0   1
8   0   0   0   1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$combo
[1] "contr.treatment"

> cont
      Contrasts
Levels A_C - A_D B_C - B_D A_C - B_C A_D - B_C A_C - A_D + B_C - B_D
   A_C         1         0         1         0                     1
   A_D        -1         0         0         1                    -1
   B_C         0         1        -1        -1                     1
   B_D         0        -1         0         0                    -1
ADD COMMENT
0
Entering edit mode

Thanks very much! Thats what I did for my contrasts.

Would it be possible if you could point me to some literature on this topic (why individual tests are better than ANOVA-like methods)? I need to convince a senior coauthor...

ADD REPLY
0
Entering edit mode

I wouldn't say one is better than the other. They are just different. The conventional thing to do (what I was taught to do at least) is to fit what you are calling the ANOVA-style model, and then first test for the interaction, remove the significant genes for that test and then look at the main effects. The rationale being that any non-significant interaction means there isn't an interaction in which case you can look at main effects. But lack of significance isn't the same as no interaction!

Unfortunately, when you are doing a bazillion tests with very little replication that isn't really a thing. As you have already noted, you can have a bunch of genes that really look like they should have a significant interaction, but the p-value won't support that conclusion. Those genes for sure won't have a significant main effect either, because they sure do look like they are affected differently by the drug. But they may well have a significant individual test (maybe for both drugs, with flipped signs).

In the end it depends on what you are after. If you want the most power to detect differences, then doing the individual tests is going to provide that. And if you assume this is just hypothesis generation, or better yet, are planning on doing some gene set testing of some sort, then you want to get as many positive hits as you can. My usual MO is to present the senior coauthor with my best argument, and if it gets shot down, I go with what they want. In the end my name goes between the et and the al, so there you go.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 1 day ago
WEHI, Melbourne, Australia

Dear fcamus,

This question is covered at some length in Section 9.5 of the limma User's Guide. Briefly, we not recommend the factorial model approach because the factorial definition of "main effects" do not have a clear meaning or interpretation in the genomic context. Rather than main effects, it is much better to compute the drug effects separately for the two diets, because the explicit contrasts are more interpretable and give more scientifically relevant information.

The interaction effect of course can just as easily be tested via direct contrasts as via the factorial model, so the factorial model is not needed for that purpose. Any hypothesis that can be tested via the factorial model can also be tested via contrasts, so it is entirely an issue of which model is most relevant and easiest to interpret rather than any question of statistical power or statistical validity.

This is not a statement about anova tests in general but specifically about factorial models.

ADD COMMENT
0
Entering edit mode

Many thanks for your response. much appreciated!

ADD REPLY

Login before adding your answer.

Traffic: 714 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6