Hi everyone, I have a general question on how to best analyse data. Background: I have a RNASeq data with two different treatments in a full factorial design: Diet (A or B) and Drug (C or D).
I analysed this dataset in two different ways. The first way is an ANOVA-like style, where I got all the DE genes for each main treatment, plus the interaction.
The second way I analysed this data is using specific contrasts... so changes in diet responses in each drug treatment OR changes in drug response in each diet.
The main thing that I noticed was that there are not many genes that showed a significant interaction between diet*drug (16 genes). However when I look at the specific contrasts, they certainly differ in the number of genes being differentially expressed. For instance, when I look at genes that respond to diet, group C = 80 genes, whereas group D = 900 genes. This would imply that there should be many genes interacting?
This has me thinking about the best way to analyse this kind of data, and from scrolling though forums there are mixed views. I have heard that its much better just to look at single contrasts, but also heard that these ANOVA-like approaches are OK? Any advice would me much appreciated.
Thanks very much! Thats what I did for my contrasts.
Would it be possible if you could point me to some literature on this topic (why individual tests are better than ANOVA-like methods)? I need to convince a senior coauthor...
I wouldn't say one is better than the other. They are just different. The conventional thing to do (what I was taught to do at least) is to fit what you are calling the ANOVA-style model, and then first test for the interaction, remove the significant genes for that test and then look at the main effects. The rationale being that any non-significant interaction means there isn't an interaction in which case you can look at main effects. But lack of significance isn't the same as no interaction!
Unfortunately, when you are doing a bazillion tests with very little replication that isn't really a thing. As you have already noted, you can have a bunch of genes that really look like they should have a significant interaction, but the p-value won't support that conclusion. Those genes for sure won't have a significant main effect either, because they sure do look like they are affected differently by the drug. But they may well have a significant individual test (maybe for both drugs, with flipped signs).
In the end it depends on what you are after. If you want the most power to detect differences, then doing the individual tests is going to provide that. And if you assume this is just hypothesis generation, or better yet, are planning on doing some gene set testing of some sort, then you want to get as many positive hits as you can. My usual MO is to present the senior coauthor with my best argument, and if it gets shot down, I go with what they want. In the end my name goes between the et and the al, so there you go.