Question

What exactly are Nontrivial node in topGO analysis?

0

Entering edit mode

theophile • 0

@ab468885

Last seen 14 months ago

France

Hi everybody,

I am a bit puzzled by topGO results. The library seems very powerful, but the documentation is quite sparse/cryptic to me. In particular, I am interested in understanding what the SigTerms / "non trivial nodes" are. From the library documentation and from this post What is Nontrivial node in topGO analysis?, they are the

number of GO categories which have at least one significant gene annotated

In my understanding, this should be independent from what statistical test I run to define what GO term are significant, as it relates to what we define as significant in the input data using the selection function. Thus, if I run a Fischer test and a KS test on the same dataset using the same threshold criteria to define what genes are significant, I expect to obtain the same number of non trivial nodes:

library(topGO)

#################################
# prepare the toy data
pvals <- c(0.372633450, 0.000195454, 0.699548147, 0.021062787, 0.732816144,
 0.805712054, 0.927868696, 0.737794221, 0.847279035, 0.662742785,
 0.204508888, 0.031615846, 0.543586800,  0.202557857, 0.410675473,
 0.394295637, 0.097123448, 0.882223568, 0.779278809, 0.926313327)

geneids <- c("ENSG00000148584", "ENSG00000175899", "ENSG00000094914", "ENSG00000114771", "ENSG00000103591",
 "ENSG00000087884", "ENSG00000127837", "ENSG00000131043", "ENSG00000149313", "ENSG00000008311",
 "ENSG00000183044", "ENSG00000165029", "ENSG00000085563", "ENSG00000005471", "ENSG00000115657",
 "ENSG00000131269", "ENSG00000023839", "ENSG00000108846", "ENSG00000117528", "ENSG00000164163")

smallset <- data.frame(GENEID = geneids, PADG = pvals)

ALPHA <- 0.01 #p-value threshold

#################################
# Run Fischer test
# 
fisher_set <- as.integer(smallset[, "PADG"] <= ALPHA)
names(fisher_set) <- smallset[, "GENEID"]

fisher_data <- new("topGOdata", ontology = "BP", allGenes = fisher_set, geneSel = function(x)(x == 1), 
                 nodeSize = 10, annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "ENSEMBL")
top_algo <- "weight01"
top_stat <- "fisher"

fisher_results <- runTest(fisher_data, algorithm = top_algo, statistic = top_stat)
geneData(fisher_results)

#################################
# Run K-S test
# 
ks_set <- smallset[, "PADG"]
names(ks_set) <- smallset[, "GENEID"]

ks_data <- new("topGOdata", ontology = "BP", allGenes = ks_set, geneSel = function(x)(x <= ALPHA), 
                 nodeSize = 10, annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "ENSEMBL")
top_algo <- "weight01"
top_stat <- "ks"

ks_results <- runTest(ks_data, algorithm = top_algo, statistic = top_stat,
    scoreOrder = "increasing")

However, this is what I get:

geneData(fisher_results)
  Annotated Significant    NodeSize    SigTerms 
         20           1          10          10

geneData(ks_results)
  Annotated Significant    NodeSize    SigTerms 
         20           1          10          16

Can anybody explain to me what is happening? Thanks!

sessionInfo( )

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SparseM_1.81         org.Hs.eg.db_3.14.0  AnnotationDbi_1.56.2 IRanges_2.28.0      
[5] S4Vectors_0.32.4     Biobase_2.54.0       topGO_2.41.0         graph_1.72.0        
[9] BiocGenerics_0.40.0 

loaded via a namespace (and not attached):
  [1] nlme_3.1-155                bitops_1.0-7                matrixStats_0.61.0         
  [4] bit64_4.0.5                 doParallel_1.0.17           RColorBrewer_1.1-3         
  [7] httr_1.4.2                  GenomeInfoDb_1.30.1         backports_1.4.1            
 [10] tools_4.1.3                 utf8_1.2.2                  R6_2.5.1                   
 [13] lasso2_1.2-22               DBI_1.1.2                   colorspace_2.0-3           
 [16] GetoptLong_1.0.5            mnormt_2.0.2                tidyselect_1.1.2           
 [19] DESeq2_1.34.0               bit_4.0.4                   Nozzle.R1_1.1-1            
 [22] compiler_4.1.3              cli_3.2.0                   logging_0.10-108           
 [25] ggdendro_0.1.23             DelayedArray_0.20.0         scales_1.1.1               
 [28] psych_2.2.3                 genefilter_1.76.0           stringr_1.4.0              
 [31] digest_0.6.29               XVector_0.34.0              pkgconfig_2.0.3            
 [34] MatrixGenerics_1.6.0        fastmap_1.1.0               limma_3.50.1               
 [37] rlang_1.0.2                 GlobalOptions_0.1.2         rstudioapi_0.13            
 [40] RSQLite_2.2.12              shape_1.4.6                 generics_0.1.2             
 [43] BiocParallel_1.28.3         dplyr_1.0.8                 RCurl_1.98-1.6             
 [46] magrittr_2.0.2              GO.db_3.14.0                GenomeInfoDbData_1.2.7     
 [49] Matrix_1.4-0                Rcpp_1.0.8.3                munsell_0.5.0              
 [52] fansi_1.0.3                 lifecycle_1.0.1             stringi_1.7.6              
 [55] edgeR_3.36.0                MASS_7.3-55                 SummarizedExperiment_1.24.0
 [58] zlibbioc_1.40.0             plyr_1.8.6                  DEGreport_1.30.3           
 [61] grid_4.1.3                  blob_1.2.3                  parallel_4.1.3             
 [64] ggrepel_0.9.1               crayon_1.5.1                lattice_0.20-45            
 [67] cowplot_1.1.1               Biostrings_2.62.0           splines_4.1.3              
 [70] annotate_1.72.0             circlize_0.4.14             KEGGREST_1.34.0            
 [73] tmvnsim_1.0-2               locfit_1.5-9.5              knitr_1.37                 
 [76] ComplexHeatmap_2.10.0       pillar_1.7.0                GenomicRanges_1.46.1       
 [79] rjson_0.2.21                geneplotter_1.72.0          codetools_0.2-18           
 [82] XML_3.99-0.9                glue_1.6.2                  png_0.1-7                  
 [85] vctrs_0.4.0                 foreach_1.5.2               tidyr_1.2.0                
 [88] gtable_0.3.0                purrr_0.3.4                 reshape_0.8.8              
 [91] clue_0.3-60                 assertthat_0.2.1            cachem_1.0.6               
 [94] ggplot2_3.3.5               xfun_0.30                   xtable_1.8-4               
 [97] broom_0.7.12                ConsensusClusterPlus_1.58.0 survival_3.2-13            
[100] tibble_3.1.6                iterators_1.0.14            memoise_2.0.1              
[103] cluster_2.1.2               ellipsis_0.3.2

topGO GO • 803 views

ADD COMMENT • link 3.0 years ago theophile • 0