Hi,
I am using RNA-Seq data for WGCNA. I have 34 samples. For my WGCNA analysis, I am using o networkType=Signed hybrid, TOM=Signed, corType=bicor, pearsonFallback = "individual" & deepSplit= 2 .
nethybrid.2 = blockwiseModules(datExpr, power = softpower,maxBlockSize = 46000, TOMType = "signed", minModuleSize = 30, deepSplit =2, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, pamRespectsDendro = FALSE, saveTOMs = TRUE,networkType = "signed hybrid", saveTOMFileBase = "34patient_signedhybrid_TOM_46000", verbose = 5,corType = "bicor", maxPOutliers = 0.1, pearsonFallback = "individual")
I know maximum WGCNA can analyze is 46000. That is why I decreased my genes to 45901. My aim is to analyse them all together in 1 block to get 1 TOM file for further network analysis. However, when I run the code below, my genes are divided into 2 block. Is there any possible way to prevent this division into multiple blocks ? Or is it possible if only I follow the steps in the step by step WGCNA tutorials. Because I also tried it, but it hangs everytime.
When I include this line of command saveTOMs = TRUE, saveTOMFileBase = "34patient_signedhybrid_TOM_46000"
to save TOM, it runs very long and never finishes. Is there anyway to optimize it ? I read this post WGCNA blockwiseModules parallelisation question suggesting fast Blast library, but I though as my data is much more smaller than their data, there might be another solution also ?
Thanks in advance,
Gokce
> sessionInfo() R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS release 6.5 (Final) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] EnsDb.Hsapiens.v79_0.99.12 ensembldb_1.4.7 [3] edgeR_3.14.0 limma_3.28.21 [5] amap_0.8-14 sva_3.20.0 [7] mgcv_1.8-15 nlme_3.1-128 [9] doParallel_1.0.10 iterators_1.0.8 [11] foreach_1.4.3 reshape_0.8.5 [13] cluster_2.0.4 matrixStats_0.50.2 [15] flashClust_1.01-2 WGCNA_1.51 [17] fastcluster_1.1.21 dynamicTreeCut_1.63-1 [19] pheatmap_1.0.8 genefilter_1.54.2 [21] gplots_3.0.1 RColorBrewer_1.1-2 [23] vsn_3.40.0 org.Hs.eg.db_3.3.0 [25] DESeq2_1.12.4 BiocParallel_1.6.6 [27] GenomicAlignments_1.8.4 SummarizedExperiment_1.2.3 [29] GenomicFeatures_1.24.5 AnnotationDbi_1.34.4 [31] Biobase_2.32.0 Rsamtools_1.24.0 [33] Biostrings_2.40.2 XVector_0.12.1 [35] GenomicRanges_1.24.3 GenomeInfoDb_1.8.7 [37] IRanges_2.6.1 S4Vectors_0.10.3 [39] BiocGenerics_0.18.0 Hmisc_3.17-4 [41] ggplot2_2.1.0 Formula_1.2-1 [43] survival_2.39-5 lattice_0.20-34 loaded via a namespace (and not attached): [1] httr_1.2.1 AnnotationHub_2.4.2 [3] splines_3.3.1 gtools_3.5.0 [5] shiny_0.14 interactiveDisplayBase_1.10.3 [7] affy_1.50.0 latticeExtra_0.6-28 [9] impute_1.46.0 RSQLite_1.0.0 [11] digest_0.6.10 chron_2.3-47 [13] colorspace_1.2-6 httpuv_1.3.3 [15] htmltools_0.3.5 preprocessCore_1.34.0 [17] Matrix_1.2-7.1 plyr_1.8.4 [19] XML_3.98-1.4 biomaRt_2.28.0 [21] zlibbioc_1.18.0 xtable_1.8-2 [23] GO.db_3.3.0 scales_0.4.0 [25] gdata_2.17.0 affyio_1.42.0 [27] annotate_1.50.0 nnet_7.3-12 [29] mime_0.5 foreign_0.8-67 [31] BiocInstaller_1.22.3 tools_3.3.1 [33] data.table_1.9.6 munsell_0.4.3 [35] locfit_1.5-9.1 caTools_1.17.1 [37] grid_3.3.1 RCurl_1.95-4.8 [39] bitops_1.0-6 gtable_0.2.0 [41] codetools_0.2-14 DBI_0.5-1 [43] R6_2.1.3 gridExtra_2.2.1 [45] rtracklayer_1.32.2 KernSmooth_2.23-15 [47] Rcpp_0.12.7 geneplotter_1.50.0 [49] rpart_4.1-10 acepack_1.3-3.3
Hi, some questions:
What do you mean with 'it never finishes'? How long did you wait or how did you figure out the program was stalling?
Why do you say that WGCNA can analyse max 46K genes?
Hi Marge,
In our server/ cluster, users have limited time (24 hours) to actively use R in interactive queue. That is why I wrote "it never finishes". In other words it is my connection problem nothing related to the program.
The max 46K genes explained below by Dr. Peter Langfelder, it is the number WGCNA can handle per block. For example, you can still analyse 460K genes but you need to do it with at least 10 blocks.
Thanks a lot for the explanations (to you both).
Best,
Marge