What exactly are Nontrivial node in topGO analysis?
0
0
Entering edit mode
theophile • 0
@ab468885
Last seen 11 months ago
France

Hi everybody,

I am a bit puzzled by topGO results. The library seems very powerful, but the documentation is quite sparse/cryptic to me. In particular, I am interested in understanding what the SigTerms / "non trivial nodes" are. From the library documentation and from this post What is Nontrivial node in topGO analysis?, they are the

number of GO categories which have at least one significant gene annotated

In my understanding, this should be independent from what statistical test I run to define what GO term are significant, as it relates to what we define as significant in the input data using the selection function. Thus, if I run a Fischer test and a KS test on the same dataset using the same threshold criteria to define what genes are significant, I expect to obtain the same number of non trivial nodes:

library(topGO)

#################################
# prepare the toy data
pvals <- c(0.372633450, 0.000195454, 0.699548147, 0.021062787, 0.732816144,
 0.805712054, 0.927868696, 0.737794221, 0.847279035, 0.662742785,
 0.204508888, 0.031615846, 0.543586800,  0.202557857, 0.410675473,
 0.394295637, 0.097123448, 0.882223568, 0.779278809, 0.926313327)

geneids <- c("ENSG00000148584", "ENSG00000175899", "ENSG00000094914", "ENSG00000114771", "ENSG00000103591",
 "ENSG00000087884", "ENSG00000127837", "ENSG00000131043", "ENSG00000149313", "ENSG00000008311",
 "ENSG00000183044", "ENSG00000165029", "ENSG00000085563", "ENSG00000005471", "ENSG00000115657",
 "ENSG00000131269", "ENSG00000023839", "ENSG00000108846", "ENSG00000117528", "ENSG00000164163")

smallset <- data.frame(GENEID = geneids, PADG = pvals)

ALPHA <- 0.01 #p-value threshold

#################################
# Run Fischer test
# 
fisher_set <- as.integer(smallset[, "PADG"] <= ALPHA)
names(fisher_set) <- smallset[, "GENEID"]

fisher_data <- new("topGOdata", ontology = "BP", allGenes = fisher_set, geneSel = function(x)(x == 1), 
                 nodeSize = 10, annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "ENSEMBL")
top_algo <- "weight01"
top_stat <- "fisher"

fisher_results <- runTest(fisher_data, algorithm = top_algo, statistic = top_stat)
geneData(fisher_results)

#################################
# Run K-S test
# 
ks_set <- smallset[, "PADG"]
names(ks_set) <- smallset[, "GENEID"]

ks_data <- new("topGOdata", ontology = "BP", allGenes = ks_set, geneSel = function(x)(x <= ALPHA), 
                 nodeSize = 10, annot = annFUN.org, mapping = "org.Hs.eg.db", ID = "ENSEMBL")
top_algo <- "weight01"
top_stat <- "ks"

ks_results <- runTest(ks_data, algorithm = top_algo, statistic = top_stat,
    scoreOrder = "increasing")

However, this is what I get:

geneData(fisher_results)
  Annotated Significant    NodeSize    SigTerms 
         20           1          10          10
geneData(ks_results)
  Annotated Significant    NodeSize    SigTerms 
         20           1          10          16

Can anybody explain to me what is happening? Thanks!

sessionInfo( )
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
[4] LC_NUMERIC=C                   LC_TIME=French_France.1252    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] SparseM_1.81         org.Hs.eg.db_3.14.0  AnnotationDbi_1.56.2 IRanges_2.28.0      
[5] S4Vectors_0.32.4     Biobase_2.54.0       topGO_2.41.0         graph_1.72.0        
[9] BiocGenerics_0.40.0 

loaded via a namespace (and not attached):
  [1] nlme_3.1-155                bitops_1.0-7                matrixStats_0.61.0         
  [4] bit64_4.0.5                 doParallel_1.0.17           RColorBrewer_1.1-3         
  [7] httr_1.4.2                  GenomeInfoDb_1.30.1         backports_1.4.1            
 [10] tools_4.1.3                 utf8_1.2.2                  R6_2.5.1                   
 [13] lasso2_1.2-22               DBI_1.1.2                   colorspace_2.0-3           
 [16] GetoptLong_1.0.5            mnormt_2.0.2                tidyselect_1.1.2           
 [19] DESeq2_1.34.0               bit_4.0.4                   Nozzle.R1_1.1-1            
 [22] compiler_4.1.3              cli_3.2.0                   logging_0.10-108           
 [25] ggdendro_0.1.23             DelayedArray_0.20.0         scales_1.1.1               
 [28] psych_2.2.3                 genefilter_1.76.0           stringr_1.4.0              
 [31] digest_0.6.29               XVector_0.34.0              pkgconfig_2.0.3            
 [34] MatrixGenerics_1.6.0        fastmap_1.1.0               limma_3.50.1               
 [37] rlang_1.0.2                 GlobalOptions_0.1.2         rstudioapi_0.13            
 [40] RSQLite_2.2.12              shape_1.4.6                 generics_0.1.2             
 [43] BiocParallel_1.28.3         dplyr_1.0.8                 RCurl_1.98-1.6             
 [46] magrittr_2.0.2              GO.db_3.14.0                GenomeInfoDbData_1.2.7     
 [49] Matrix_1.4-0                Rcpp_1.0.8.3                munsell_0.5.0              
 [52] fansi_1.0.3                 lifecycle_1.0.1             stringi_1.7.6              
 [55] edgeR_3.36.0                MASS_7.3-55                 SummarizedExperiment_1.24.0
 [58] zlibbioc_1.40.0             plyr_1.8.6                  DEGreport_1.30.3           
 [61] grid_4.1.3                  blob_1.2.3                  parallel_4.1.3             
 [64] ggrepel_0.9.1               crayon_1.5.1                lattice_0.20-45            
 [67] cowplot_1.1.1               Biostrings_2.62.0           splines_4.1.3              
 [70] annotate_1.72.0             circlize_0.4.14             KEGGREST_1.34.0            
 [73] tmvnsim_1.0-2               locfit_1.5-9.5              knitr_1.37                 
 [76] ComplexHeatmap_2.10.0       pillar_1.7.0                GenomicRanges_1.46.1       
 [79] rjson_0.2.21                geneplotter_1.72.0          codetools_0.2-18           
 [82] XML_3.99-0.9                glue_1.6.2                  png_0.1-7                  
 [85] vctrs_0.4.0                 foreach_1.5.2               tidyr_1.2.0                
 [88] gtable_0.3.0                purrr_0.3.4                 reshape_0.8.8              
 [91] clue_0.3-60                 assertthat_0.2.1            cachem_1.0.6               
 [94] ggplot2_3.3.5               xfun_0.30                   xtable_1.8-4               
 [97] broom_0.7.12                ConsensusClusterPlus_1.58.0 survival_3.2-13            
[100] tibble_3.1.6                iterators_1.0.14            memoise_2.0.1              
[103] cluster_2.1.2               ellipsis_0.3.2
topGO GO • 754 views
ADD COMMENT

Login before adding your answer.

Traffic: 719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6