The question (finally!)

Question

WGCNA: Obtaining a consensus network from sampledBlockwiseModules function output

0

Entering edit mode

Caleb Bostwick ▴ 30

@caleb-bostwick-6580

Last seen 3.2 years ago

United States

Hello, I had a question arise when assessing the robustness of resampled gene modules using WGCNA. I would like to create a consensus network following an approach used in the article: Avelumab plus axitinib versus sunitinib in advanced renal cell carcinoma biomarker analysis of the phase 3 JAVELIN Renal 101 trial (Nat Med. 2020 Nov;26(11):1733-1741. doi: 10.1038/s41591-020-1044-8).

In the methods, the authors write that:

Baseline data (n = 720) were randomly subsampled down to 80% of samples for 1,000 times. With each subsampled dataset, we constructed a coexpression network using the WGCNA procedure as described above. An adjacency matrix was constructed by counting the frequencies that two genes were clustered into the same network module. A consensus network was then constructed using the new adjacency matrix via the standard WGCNA approach. Edges were then filtered by an adjacency measure of 0.8.

Searching through the WGCNA tutorials, I found a resampling tutorial (1. Example of module stability analysis using resampling of microarray samples) explaining the use of the function sampledBlockwiseModules. Using sampledBlockwiseModules on my baseline dataset (which includes 4924 genes and 749 samples that I randomly subsampled down to 80% (leaving 599 samples) for 1000 iterations), I was able to construct the 1000 coexpression networks as follows.

nRuns <- 1000
mods0 <- sampledBlockwiseModules(
  exp.mat,
  nRuns,
  replace = FALSE,
  fraction = 0.8,
  randomSeed = 12345,
  checkSoftPower = TRUE,
  nPowerCheckSamples = 2000,
  skipUnsampledCalculation = FALSE,
  corType = "pearson",
  power = 5,
  networkType = "signed",
  saveTOMs = FALSE,
  saveTOMFileBase = "sampled.TOM",
  verbose = 3)

Further following the above WGCNA tutorial, I created a matrix called labels from their provided code:

nGenes = ncol(exp.mat)
# Define a matrix of labels for the original and all resampling runs
labels = matrix(0, nGenes, nRuns + 1)
labels[, 1] = mods0[[1]]$mods$colors
# Relabel modules in each of the resampling runs so that full and resampled modules with best overlaps have
# the same labels. This is achieved by the function matchLabels.
pind = initProgInd()
for (r in 2:(nRuns+1))
{
labels[, r] = matchLabels(mods0[[r]]$mods$colors, labels[, 1])
pind = updateProgInd((r-1)/nRuns, pind)
}

The matrix labels consists of 1000 columns (corresponding to each iteration of sampledBlockwiseModules) and 4924 rows (corresponding to each gene). Each [i,j] entry consists of the module color (grey, pink, green, etc.) assigned to the ith gene in the jth network iteration. Similar to the following simulated data:

colors <- sample(c("grey", "green", "blue", "pink", "brown", "purple", "cyan", "red", "yellow"), 8, replace = TRUE)
labels <- matrix(rep(colors, each = 2), nrow = 8, ncol = 5)
labels
     [,1]     [,2]     [,3]     [,4]     [,5]    
[1,] "brown"  "yellow" "brown"  "yellow" "brown" 
[2,] "brown"  "yellow" "brown"  "yellow" "brown" 
[3,] "grey"   "brown"  "grey"   "brown"  "grey"  
[4,] "grey"   "brown"  "grey"   "brown"  "grey"  
[5,] "purple" "green"  "purple" "green"  "purple"
[6,] "purple" "green"  "purple" "green"  "purple"
[7,] "red"    "red"    "red"    "red"    "red"   
[8,] "red"    "red"    "red"    "red"    "red"

In this instance, we see that genes 1 and 2 are both members of the "brown" module in networks 1, 3, and 5 and both members of the "yellow" module in networks 2 and 4. We also see that genes 7 and 8 are part of the "red" module in every network (1, 2, 3, 4, and 5).

The question (finally!)

My question is, how can I convert from this labels matrix (or the sampledBlockwiseModules output) into an adjacency matrix by counting the frequencies that two genes were clustered into the same network module? From here, I think I can make the consensus network via the standard WGCNA approach, but I would also appreciate any input into the meaning of the filtering step:

Edges were then filtered by an adjacency measure of 0.8.

Thanks in advance to anyone reading this question and providing their assistance!

Best,

CB

sessionInfo( )
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 15 SP2

Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas_pthreads.so.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.14.0          AnnotationFilter_1.14.0   GenomicFeatures_1.42.3    GenomicRanges_1.42.0     
 [6] GenomeInfoDb_1.26.4       org.Hs.eg.db_3.12.0       AnnotationDbi_1.52.0      IRanges_2.24.1            S4Vectors_0.28.1         
[11] Biobase_2.50.0            BiocGenerics_0.36.0       survminer_0.4.9           ggpubr_0.4.0              survival_3.2-10          
[16] clusterProfiler_3.18.1    msigdbr_7.4.1             genefilter_1.72.1         RColorBrewer_1.1-2        stringr_1.4.0            
[21] WGCNA_1.70-3              fastcluster_1.2.3         dynamicTreeCut_1.63-1     edgeR_3.32.1              limma_3.46.0             
[26] reshape2_1.4.4            renv_0.14.0               Matrix_1.3-2              rJava_1.0-4               igraph_1.2.6             
[31] Runiversal_1.0.2          DT_0.19                   kableExtra_1.3.4          tidyr_1.1.3               plotly_4.9.4.1           
[36] ggplot2_3.3.5             dplyr_1.0.7               ProfilerAPI2_2.2.0       

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3              rtracklayer_1.50.0          R.methodsS3_1.8.1           bit64_4.0.5                 knitr_1.34                 
  [6] irlba_2.3.3                 DelayedArray_0.16.3         R.utils_2.10.1              data.table_1.14.0           rpart_4.1-15               
 [11] RCurl_1.98-1.5              doParallel_1.0.16           generics_0.1.0              preprocessCore_1.52.1       callr_3.7.0                
 [16] cowplot_1.1.1               usethis_2.1.3               RSQLite_2.2.8               shadowtext_0.0.9            proxy_0.4-26               
 [21] tzdb_0.1.2                  bit_4.0.4                   enrichplot_1.10.2           webshot_0.5.2               xml2_1.3.2                 
 [26] ggsci_2.9                   SummarizedExperiment_1.20.0 assertthat_0.2.1            viridis_0.6.1               xfun_0.26                  
 [31] hms_1.1.0                   jquerylib_0.1.4             babelgene_21.4              evaluate_0.14               progress_1.2.2             
 [36] fansi_0.5.0                 dbplyr_2.1.1                km.ci_0.5-2                 DBI_1.1.1                   htmlwidgets_1.5.4          
 [41] Rmpfr_0.8-4                 CVXR_1.0-9                  purrr_0.3.4                 ellipsis_0.3.2              crosstalk_1.1.1            
 [46] corrplot_0.90               backports_1.2.1             markdown_1.1                annotate_1.68.0             biomaRt_2.46.3             
 [51] MatrixGenerics_1.2.1        vctrs_0.3.8                 remotes_2.4.1               abind_1.4-5                 cachem_1.0.6               
 [56] withr_2.4.2                 ggforce_0.3.3               vroom_1.5.5                 checkmate_2.0.0             GenomicAlignments_1.26.0   
 [61] prettyunits_1.1.1           MultiAssayExperiment_1.16.0 mclust_5.4.7                svglite_2.0.0               cluster_2.1.1              
 [66] DOSE_3.16.0                 lazyeval_0.2.2              crayon_1.4.1                labeling_0.4.2              pkgconfig_2.0.3            
 [71] tweenr_1.0.2                ProtGenerics_1.22.0         vipor_0.4.5                 pkgload_1.2.3               nnet_7.3-15                
 [76] devtools_2.4.2              rlang_0.4.11                lifecycle_1.0.0             downloader_0.4              BiocFileCache_1.14.0       
 [81] rprojroot_2.0.2             polyclip_1.10-0             matrixStats_0.61.0          KMsurv_0.1-5                carData_3.0-4              
 [86] zoo_1.8-9                   Rhdf5lib_1.12.1             boot_1.3-27                 base64enc_0.1-3             beeswarm_0.4.0             
 [91] processx_3.5.2              pheatmap_1.0.12             png_0.1-7                   viridisLite_0.4.0           ini_0.3.1                  
 [96] bitops_1.0-7                R.oo_1.24.0                 rhdf5filters_1.2.0          MOFA_1.6.2                  Biostrings_2.58.0          
[101] blob_1.2.2                  qvalue_2.22.0               rstatix_0.7.0               jpeg_0.1-9                  ggsignif_0.6.3             
[106] scales_1.1.1                memoise_2.0.0               magrittr_2.0.1              plyr_1.8.6                  zlibbioc_1.36.0            
[111] compiler_4.0.5              scatterpie_0.1.7            QuaternaryProd_1.24.0       Rsamtools_2.6.0             cli_3.1.0                  
[116] XVector_0.30.0              ps_1.6.0                    htmlTable_2.2.1             Formula_1.2-4               MASS_7.3-53.1              
[121] tidyselect_1.1.1            stringi_1.7.4               highr_0.9                   yaml_2.2.1                  GOSemSim_2.16.1            
[126] askpass_1.1                 locfit_1.5-9.4              latticeExtra_0.6-29         ggrepel_0.9.1               survMisc_0.5.5             
[131] grid_4.0.5                  sass_0.4.0                  fastmatch_1.1-3             tools_4.0.5                 rstudioapi_0.13            
[136] RPresto_1.3.7               foreach_1.5.1               foreign_0.8-81              gridExtra_2.3               farver_2.1.0               
[141] ggraph_2.0.5                RcppZiggurat_0.1.6          digest_0.6.27               rvcheck_0.1.8               BiocManager_1.30.16        
[146] ggtext_0.1.1                gridtext_0.1.4              Rcpp_1.0.7                  car_3.0-12                  broom_0.7.10               
[151] httr_1.4.2                  colorspace_2.0-2            rvest_1.0.1                 XML_3.99-0.8                fs_1.5.0                   
[156] reticulate_1.22             splines_4.0.5               graphlayouts_0.7.1          sessioninfo_1.2.1           systemfonts_1.0.2          
[161] xtable_1.8-4                gmp_0.6-2                   jsonlite_1.7.2              tidygraph_1.2.0             Rfast_2.0.3                
[166] ggfun_0.0.4                 testthat_3.1.0              R6_2.5.1                    Hmisc_4.5-0                 pillar_1.6.2               
[171] htmltools_0.5.2             glue_1.4.2                  fastmap_1.1.0               BiocParallel_1.24.1         class_7.3-18               
[176] codetools_0.2-18            fgsea_1.16.0                pkgbuild_1.2.0              utf8_1.2.2                  bslib_0.3.0                
[181] lattice_0.20-41             tibble_3.1.4                curl_4.3.2                  ggbeeswarm_0.6.0            settings_0.2.7             
[186] openssl_1.4.5               GO.db_3.12.1                CompQuadForm_1.4.3          rmarkdown_2.11              desc_1.4.0                 
[191] munsell_0.5.0               e1071_1.7-9                 DO.db_2.9                   rhdf5_2.34.0                GenomeInfoDbData_1.2.4     
[196] iterators_1.0.13            impute_1.64.0               gtable_0.3.0

Network consensus sampledBlockwiseModules resampling WGCNA • 1.7k views

ADD COMMENT • link updated 3.2 years ago by Peter Langfelder ★ 3.0k • written 3.3 years ago by Caleb Bostwick ▴ 30

0

Entering edit mode

Not sure if this is the best way to do it, but for an individual vector of module labels, you can use something like inSameModule = outer(labels, labels,==) . I would define a vector holding the component-wise sum of inSameModule, something like

countInSameModule = rep(0, nGenes*nGenes)
for (run in 1:nRuns)
  countInSameModule = countInSameModule + outer(labels[, run], labels[, run], FUN=`==`)

Then you can define the adjacency as something like

adjacency = matrix(countInSameModule > 0.8*nRuns, nGenes, nGenes)

assign colnames and rownames and run clustering etc.

Having said all this, my personal experience is that the resampling approaches are good for evaluation of module stability but less good if you want to define modules with well-defined biological function (i.e., strong and specific enrichment).

ADD REPLY • link 3.2 years ago Peter Langfelder ★ 3.0k