Hello, I had a question arise when assessing the robustness of resampled gene modules using WGCNA. I would like to create a consensus network following an approach used in the article: Avelumab plus axitinib versus sunitinib in advanced renal cell carcinoma biomarker analysis of the phase 3 JAVELIN Renal 101 trial (Nat Med. 2020 Nov;26(11):1733-1741. doi: 10.1038/s41591-020-1044-8).
In the methods, the authors write that:
Baseline data (n = 720) were randomly subsampled down to 80% of samples for 1,000 times. With each subsampled dataset, we constructed a coexpression network using the WGCNA procedure as described above. An adjacency matrix was constructed by counting the frequencies that two genes were clustered into the same network module. A consensus network was then constructed using the new adjacency matrix via the standard WGCNA approach. Edges were then filtered by an adjacency measure of 0.8.
Searching through the WGCNA tutorials, I found a resampling tutorial (1. Example of module stability analysis using resampling of microarray samples) explaining the use of the function sampledBlockwiseModules. Using sampledBlockwiseModules on my baseline dataset (which includes 4924 genes and 749 samples that I randomly subsampled down to 80% (leaving 599 samples) for 1000 iterations), I was able to construct the 1000 coexpression networks as follows.
nRuns <- 1000
mods0 <- sampledBlockwiseModules(
exp.mat,
nRuns,
replace = FALSE,
fraction = 0.8,
randomSeed = 12345,
checkSoftPower = TRUE,
nPowerCheckSamples = 2000,
skipUnsampledCalculation = FALSE,
corType = "pearson",
power = 5,
networkType = "signed",
saveTOMs = FALSE,
saveTOMFileBase = "sampled.TOM",
verbose = 3)
Further following the above WGCNA tutorial, I created a matrix called labels
from their provided code:
nGenes = ncol(exp.mat)
# Define a matrix of labels for the original and all resampling runs
labels = matrix(0, nGenes, nRuns + 1)
labels[, 1] = mods0[[1]]$mods$colors
# Relabel modules in each of the resampling runs so that full and resampled modules with best overlaps have
# the same labels. This is achieved by the function matchLabels.
pind = initProgInd()
for (r in 2:(nRuns+1))
{
labels[, r] = matchLabels(mods0[[r]]$mods$colors, labels[, 1])
pind = updateProgInd((r-1)/nRuns, pind)
}
The matrix labels
consists of 1000 columns (corresponding to each iteration of sampledBlockwiseModules) and 4924 rows (corresponding to each gene). Each [i,j] entry consists of the module color (grey, pink, green, etc.) assigned to the ith gene in the jth network iteration. Similar to the following simulated data:
colors <- sample(c("grey", "green", "blue", "pink", "brown", "purple", "cyan", "red", "yellow"), 8, replace = TRUE)
labels <- matrix(rep(colors, each = 2), nrow = 8, ncol = 5)
labels
[,1] [,2] [,3] [,4] [,5]
[1,] "brown" "yellow" "brown" "yellow" "brown"
[2,] "brown" "yellow" "brown" "yellow" "brown"
[3,] "grey" "brown" "grey" "brown" "grey"
[4,] "grey" "brown" "grey" "brown" "grey"
[5,] "purple" "green" "purple" "green" "purple"
[6,] "purple" "green" "purple" "green" "purple"
[7,] "red" "red" "red" "red" "red"
[8,] "red" "red" "red" "red" "red"
In this instance, we see that genes 1 and 2 are both members of the "brown" module in networks 1, 3, and 5 and both members of the "yellow" module in networks 2 and 4. We also see that genes 7 and 8 are part of the "red" module in every network (1, 2, 3, 4, and 5).
The question (finally!)
My question is, how can I convert from this labels
matrix (or the sampledBlockwiseModules output) into an adjacency matrix by counting the frequencies that two genes were clustered into the same network module?
From here, I think I can make the consensus network via the standard WGCNA approach, but I would also appreciate any input into the meaning of the filtering step:
Edges were then filtered by an adjacency measure of 0.8.
Thanks in advance to anyone reading this question and providing their assistance!
Best,
CB
sessionInfo( )
R version 4.0.5 (2021-03-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: SUSE Linux Enterprise Server 15 SP2
Matrix products: default
BLAS/LAPACK: /usr/lib64/libopenblas_pthreads.so.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices datasets utils methods base
other attached packages:
[1] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.14.0 AnnotationFilter_1.14.0 GenomicFeatures_1.42.3 GenomicRanges_1.42.0
[6] GenomeInfoDb_1.26.4 org.Hs.eg.db_3.12.0 AnnotationDbi_1.52.0 IRanges_2.24.1 S4Vectors_0.28.1
[11] Biobase_2.50.0 BiocGenerics_0.36.0 survminer_0.4.9 ggpubr_0.4.0 survival_3.2-10
[16] clusterProfiler_3.18.1 msigdbr_7.4.1 genefilter_1.72.1 RColorBrewer_1.1-2 stringr_1.4.0
[21] WGCNA_1.70-3 fastcluster_1.2.3 dynamicTreeCut_1.63-1 edgeR_3.32.1 limma_3.46.0
[26] reshape2_1.4.4 renv_0.14.0 Matrix_1.3-2 rJava_1.0-4 igraph_1.2.6
[31] Runiversal_1.0.2 DT_0.19 kableExtra_1.3.4 tidyr_1.1.3 plotly_4.9.4.1
[36] ggplot2_3.3.5 dplyr_1.0.7 ProfilerAPI2_2.2.0
loaded via a namespace (and not attached):
[1] rappdirs_0.3.3 rtracklayer_1.50.0 R.methodsS3_1.8.1 bit64_4.0.5 knitr_1.34
[6] irlba_2.3.3 DelayedArray_0.16.3 R.utils_2.10.1 data.table_1.14.0 rpart_4.1-15
[11] RCurl_1.98-1.5 doParallel_1.0.16 generics_0.1.0 preprocessCore_1.52.1 callr_3.7.0
[16] cowplot_1.1.1 usethis_2.1.3 RSQLite_2.2.8 shadowtext_0.0.9 proxy_0.4-26
[21] tzdb_0.1.2 bit_4.0.4 enrichplot_1.10.2 webshot_0.5.2 xml2_1.3.2
[26] ggsci_2.9 SummarizedExperiment_1.20.0 assertthat_0.2.1 viridis_0.6.1 xfun_0.26
[31] hms_1.1.0 jquerylib_0.1.4 babelgene_21.4 evaluate_0.14 progress_1.2.2
[36] fansi_0.5.0 dbplyr_2.1.1 km.ci_0.5-2 DBI_1.1.1 htmlwidgets_1.5.4
[41] Rmpfr_0.8-4 CVXR_1.0-9 purrr_0.3.4 ellipsis_0.3.2 crosstalk_1.1.1
[46] corrplot_0.90 backports_1.2.1 markdown_1.1 annotate_1.68.0 biomaRt_2.46.3
[51] MatrixGenerics_1.2.1 vctrs_0.3.8 remotes_2.4.1 abind_1.4-5 cachem_1.0.6
[56] withr_2.4.2 ggforce_0.3.3 vroom_1.5.5 checkmate_2.0.0 GenomicAlignments_1.26.0
[61] prettyunits_1.1.1 MultiAssayExperiment_1.16.0 mclust_5.4.7 svglite_2.0.0 cluster_2.1.1
[66] DOSE_3.16.0 lazyeval_0.2.2 crayon_1.4.1 labeling_0.4.2 pkgconfig_2.0.3
[71] tweenr_1.0.2 ProtGenerics_1.22.0 vipor_0.4.5 pkgload_1.2.3 nnet_7.3-15
[76] devtools_2.4.2 rlang_0.4.11 lifecycle_1.0.0 downloader_0.4 BiocFileCache_1.14.0
[81] rprojroot_2.0.2 polyclip_1.10-0 matrixStats_0.61.0 KMsurv_0.1-5 carData_3.0-4
[86] zoo_1.8-9 Rhdf5lib_1.12.1 boot_1.3-27 base64enc_0.1-3 beeswarm_0.4.0
[91] processx_3.5.2 pheatmap_1.0.12 png_0.1-7 viridisLite_0.4.0 ini_0.3.1
[96] bitops_1.0-7 R.oo_1.24.0 rhdf5filters_1.2.0 MOFA_1.6.2 Biostrings_2.58.0
[101] blob_1.2.2 qvalue_2.22.0 rstatix_0.7.0 jpeg_0.1-9 ggsignif_0.6.3
[106] scales_1.1.1 memoise_2.0.0 magrittr_2.0.1 plyr_1.8.6 zlibbioc_1.36.0
[111] compiler_4.0.5 scatterpie_0.1.7 QuaternaryProd_1.24.0 Rsamtools_2.6.0 cli_3.1.0
[116] XVector_0.30.0 ps_1.6.0 htmlTable_2.2.1 Formula_1.2-4 MASS_7.3-53.1
[121] tidyselect_1.1.1 stringi_1.7.4 highr_0.9 yaml_2.2.1 GOSemSim_2.16.1
[126] askpass_1.1 locfit_1.5-9.4 latticeExtra_0.6-29 ggrepel_0.9.1 survMisc_0.5.5
[131] grid_4.0.5 sass_0.4.0 fastmatch_1.1-3 tools_4.0.5 rstudioapi_0.13
[136] RPresto_1.3.7 foreach_1.5.1 foreign_0.8-81 gridExtra_2.3 farver_2.1.0
[141] ggraph_2.0.5 RcppZiggurat_0.1.6 digest_0.6.27 rvcheck_0.1.8 BiocManager_1.30.16
[146] ggtext_0.1.1 gridtext_0.1.4 Rcpp_1.0.7 car_3.0-12 broom_0.7.10
[151] httr_1.4.2 colorspace_2.0-2 rvest_1.0.1 XML_3.99-0.8 fs_1.5.0
[156] reticulate_1.22 splines_4.0.5 graphlayouts_0.7.1 sessioninfo_1.2.1 systemfonts_1.0.2
[161] xtable_1.8-4 gmp_0.6-2 jsonlite_1.7.2 tidygraph_1.2.0 Rfast_2.0.3
[166] ggfun_0.0.4 testthat_3.1.0 R6_2.5.1 Hmisc_4.5-0 pillar_1.6.2
[171] htmltools_0.5.2 glue_1.4.2 fastmap_1.1.0 BiocParallel_1.24.1 class_7.3-18
[176] codetools_0.2-18 fgsea_1.16.0 pkgbuild_1.2.0 utf8_1.2.2 bslib_0.3.0
[181] lattice_0.20-41 tibble_3.1.4 curl_4.3.2 ggbeeswarm_0.6.0 settings_0.2.7
[186] openssl_1.4.5 GO.db_3.12.1 CompQuadForm_1.4.3 rmarkdown_2.11 desc_1.4.0
[191] munsell_0.5.0 e1071_1.7-9 DO.db_2.9 rhdf5_2.34.0 GenomeInfoDbData_1.2.4
[196] iterators_1.0.13 impute_1.64.0 gtable_0.3.0
Not sure if this is the best way to do it, but for an individual vector of module labels, you can use something like
inSameModule = outer(labels, labels,
==)
. I would define a vector holding the component-wise sum of inSameModule, something likeThen you can define the adjacency as something like
assign
colnames
andrownames
and run clustering etc.Having said all this, my personal experience is that the resampling approaches are good for evaluation of module stability but less good if you want to define modules with well-defined biological function (i.e., strong and specific enrichment).