Note that Fisher's (combined probability) test tends to give low p-values to a huge amount of genes. There is also a tendency for this method to return gene-set p-values that correlate with gene-set size (see e.g. Fig 3B in Väremo et al. (2013)).
Normal p-values sometimes have a higher resolution (more unique values) than adjusted p-values so in that sense it could be good to use as input. The gene-set p-values should however be adjusted for multiple testing. One could also use the adj p-values as input. Maybe someone with a more solid statistical background could add a comment on this?
Apart from those notes, the syntax of your command looks correct to me.
And a recommendation: once you have your gene-set results and conclusion, go back to the gene-level data for the specific gene-sets and spot-check/validate that your results are sensible given the input data.
About those low p-values, how should we interpret the following case;
Genes (up)
Stat (mix.dir.up)
p (mix.dir.up)
p adj (mix.dir.up)
Genes (down)
Stat (mix.dir.dn)
p (mix.dir.dn)
p adj (mix.dir.dn)
13
1714.4
0
0
1
16.757
0.00022976
0.00022976
13
1714.4
0
0
1
16.757
0.00022976
0.00022976
In both cases only one gene is down (in comparison with 13 up). Concerning the stats for the gene that went down, this still results in a p-value <0.05. Hence, a significant effect on the concerned GO by one gene. Or are we interpreting this in the wrong way?
Yes that looks a bit weird of course. Note that the mixed-directional score is calculated by essentially subsetting the gene-set into two parts, one with the up-regulated genes and one with the down-regulated genes. The two parts are "unaware" of each other. In this case it means that a gene-set of 1 (down-regulated) gene got fairly significant, probably based on the fact that the single gene itself was quite significant.
I would take the number of genes into account (as you do) when you interpret these results.
An alternative would be to choose a method that would also return the distinct directional score, which for your example gene-set would definitely mark it as affected by up-regulation, but not down-regulation (since it does not do the subsetting in that case).
Yes that looks a bit weird of course. Note that the mixed-directional score is calculated by essentially subsetting the gene-set into two parts, one with the up-regulated genes and one with the down-regulated genes. The two parts are "unaware" of each other. In this case it means that a gene-set of 1 (down-regulated) gene got fairly significant, probably based on the fact that the single gene itself was quite significant.
I would take the number of genes into account (as you do) when you interpret these results.
An alternative would be to choose a method that would also return the distinct directional score, which for your example gene-set would definitely mark it as affected by up-regulation, but not down-regulation (since it does not do the subsetting in that case).