Hi!
I learnt from the this old post (limma roast syntax for overall anova) that ROAST cannot be performed on multiple contrasts (i.e., with the F-test). This is still true, I suppose? In that post it was suggested to use geneSetTest
with Fstatistic (from topTable
).
First, my understanding was that roast
and geneSetTest
test different hypotheses, i.e., self-contained vs competitive. So this is not exactly apples and apples, is it?
Second, there is also cameraPR
now available in limma. What is the recommendation for using geneSetTest
vs cameraPR
? In my analysis, I get very good significance (p<0.001) with geneSetTest
and no significance (p>0.5) with cameraPR
(once the inter gene correlation is included). I would like to know which to believe/interpret.
Third, more generally, in my analyses, I find situations where I apply either roast
or camera
on a single gene set, I get discordant conclusions. I understand they test different hypotheses. I do not want to indulge in p-value hacking and pick the test that fits my story. So do you have suggestions as to how to go about this in a consistent manner. (as an aside, I can suggest an explanation for no significance in ROAST
but significance in CAMERA
, when the effects are so small and limited to be insignificant overall due to multiple testing, but gene set of interest has all the genes on which there is some effect).
Thanks
Hi Aaron
Thank you for your reply. If I may ask a couple of follow up questions.
1. Suppose I have identified a set of genes from analysis of one data set and want to test if these same genes are interestingly regulated in another dataset. Could I still use
camera()/geneSetTest()
(to compare against background) or is there a more sound approach to deal with this? In other words, my gene set is not annotated gene set associated with a pathway but genes identified in another study.2. Suppose these genes are precisely of interest because they are highly co-regulated in the the sense that they have similar time courses. Is this then still accounted for the
inter.gene.corr = 0.01
incameraPR()
? Should I estimate the correlation usinginterGeneCorrelation()?
I would expect these genes to be very correlated and would one ever get significance in this scenario using camera due to the variance inflation?Thanks in advance.
For your first question: yes, that is fine, and is the basis of at least a few of the MSigDB collections. Just think about it; the annotated gene sets had to come from somewhere (i.e., previous data), they weren't given to us like manna from heaven. The key thing is that your two datasets should be generated independently. Obviously it would not make sense to define a gene set and then test it on the same dataset (or a dataset with some other dependencies, e.g., same patients).
For your second question; read the
?camera
documentation. The defaultinter.gene.corr=0.01
gives a better ranking but does not strictly control the error rate. If you don't care about the rankings across gene sets, and you only want to know whether your particular gene set of interest contains high-ranking DE genes, then estimating the correlation withinterGeneCorrelation()
seems to be the better approach as it provides correct type I error control.Remember, what you really want to know is whether or not the genes in the set are (more) DE. The correlation between genes is just a nuisance parameter that needs to be overcome to answer your real question. However, if you do not model the correlation correctly, a low p-value could be purely driven by correlations in the absence of any DE. In most cases, detecting a gene set as containing correlated genes is not particularly interesting; this would be expected by definition.