Question

piano runGSA with input from DESEQ2

0

Entering edit mode

simonp.snoeck • 0

@simonpsnoeck-11800

Last seen 8.4 years ago

Hi,

For performing a gene enrichment analysis, we used the following settings for the R-function runGSA (piano package);

gsaRes_xxx<-runGSA(pval_xxx, geneSetStat="fisher", directions=fc_xxx, signifMethod="nullDist", adjMethod="BH", gsc=gsc, gsSizeLim=c(5,Inf))

with:

fc_xxx = log2fc of genes (ouput deseq2)

pval_xxx = the p-values (output deseq2) or should we use the adj p-value from deseq2?

This seemed to work, can anyone confirm our settings?

Kind regards,

Simon

deseq2 gene ontology piano • 2.1k views

ADD COMMENT • link 8.4 years ago simonp.snoeck • 0

score 0 · Answer 1 · 2016-11-08

Note that Fisher's (combined probability) test tends to give low p-values to a huge amount of genes. There is also a tendency for this method to return gene-set p-values that correlate with gene-set size (see e.g. Fig 3B in Väremo et al. (2013)).

Normal p-values sometimes have a higher resolution (more unique values) than adjusted p-values so in that sense it could be good to use as input. The gene-set p-values should however be adjusted for multiple testing. One could also use the adj p-values as input. Maybe someone with a more solid statistical background could add a comment on this?

Apart from those notes, the syntax of your command looks correct to me.

And a recommendation: once you have your gene-set results and conclusion, go back to the gene-level data for the specific gene-sets and spot-check/validate that your results are sensible given the input data.

Kind regards

Leif

score 0 · Answer 2 · 2016-11-08

0

Entering edit mode

simonp.snoeck • 0

@simonpsnoeck-11800

Last seen 8.4 years ago

Thanks Leif,

About those low p-values, how should we interpret the following case;

Genes (up)	Stat (mix.dir.up)	p (mix.dir.up)	p adj (mix.dir.up)	Genes (down)	Stat (mix.dir.dn)	p (mix.dir.dn)	p adj (mix.dir.dn)
13	1714.4	0	0	1	16.757	0.00022976	0.00022976
13	1714.4	0	0	1	16.757	0.00022976	0.00022976

In both cases only one gene is down (in comparison with 13 up). Concerning the stats for the gene that went down, this still results in a p-value <0.05. Hence, a significant effect on the concerned GO by one gene. Or are we interpreting this in the wrong way?

Kind regards,

Simon

ADD COMMENT • link 8.4 years ago simonp.snoeck • 0

0

Entering edit mode

Yes that looks a bit weird of course. Note that the mixed-directional score is calculated by essentially subsetting the gene-set into two parts, one with the up-regulated genes and one with the down-regulated genes. The two parts are "unaware" of each other. In this case it means that a gene-set of 1 (down-regulated) gene got fairly significant, probably based on the fact that the single gene itself was quite significant.

I would take the number of genes into account (as you do) when you interpret these results.

An alternative would be to choose a method that would also return the distinct directional score, which for your example gene-set would definitely mark it as affected by up-regulation, but not down-regulation (since it does not do the subsetting in that case).

ADD REPLY • link 8.4 years ago Leif Väremo ▴ 70