Question

TopGO : significant GO term, but 0 gene significant in this GO term (with elim-ks method)

0

Entering edit mode

Amandine • 0

@Amandine-24659

Last seen 4.2 years ago

Good morning all,

I'm having trouble with topGO, using my own annotations and the elimks method (with options: algorithm = "elim", statistic = "ks").

In the case of several GO terms that come out significant (with a pvalue <0.05), I see that there is no significant gene in it.

Note : if I use the algorithm = "weight01" and statistic = "fisher" options, I don't have this problem.

https://www.casimages.com/i/210127093705575902.png.html

In this image you can see a subset of my result table and you can see the number "0" in the significant column while the GO term comes out enriched in the analysis.

If you want to test the code to generate this result, I give you files to reproduce that.

https://filesender.renater.fr/?s=download&token=36969f68-f86f-4096-b5c1-96ee27fcb1d9

Here you have 3 files :

analyse_expression_differentielle_medium_virus_vs_medium_sain.txt : The file containing the differential gene expression analysis, performed with the NOISeq tool (genes with a probability > 0.95 are significant, this is similar to a FDR <0.05).
blast2go.for_TOPGO_formatted.txt : The gene annotation file, generated with blast2go
topgo_subscript.R : the script generating this problem

I had contacted the developer of topGO, but he told me that he was no longer actively developing the topGO package. So he advised me to post my problem here.

I could use the Fisher method, but what I like about the elimks method is that the enriched GO terms are less "general" than with the Fisher method ... this is the principle of the elimks method of being able to show enriched GO terms lower in the GO terms tree.

Any help will be welcome, thank you very much. Best Amandine

topGO • 2.4k views

ADD COMMENT • link updated 4.2 years ago by James W. MacDonald 68k • written 4.2 years ago by Amandine • 0

score 1 · Answer 1 · 2021-01-27

It's possible that you are conflating things. The elim/weight/classic methods are orthogonal to the test used. The former are just ways of adjusting the genes annotated to a particular GO term based on significance of an offspring term, where for the elim method any genes in a significant offspring term is eliminated from contention in any ancestor terms. The weight algorithm uses weights in [0-1] rather than being binary like the elim, and the classic method ignores the issue.

But that's unrelated to the test. The Fisher test does a hypergeometric test based on the genes that are in the GO term versus those that are not, where the idea is to detect enrichment of genes from a particular GO term in your set of significant genes. Which heuristically makes sense to me. The KS test is a distributional test that is intended to determine if the distribution of one set of samples is different from the distribution of another set. I don't know how one should interpret a significant result from the KS test, particularly in this context. Simply having evidence that two distributions are 'different' is wholly unsatisfactory to me.

Is there any reason you don't want to do a Fisher test using the elim method?