EGSEA::egsea - Parallelization issue and "Error in array(col.rgb[, i], dim(node.rgb)[3:1]) : negative length vectors are not allowed"
0
0
Entering edit mode
neilzhao ▴ 30
@0f8d7a2e
Last seen 5 months ago
United States

1.) EGSEA testing with report generation enabled consistently gives me the following error:

Error in array(col.rgb[, i], dim(node.rgb)[3:1]) : negative length vectors are not allowed

This error occurs for both the report=TRUE setting or using the manual S4 method generateReport() function.

2.) Parallelization does not seem to work regardless of how many threads my processor has available. This is the consistent message seen at the beginning of the EGSEA analysis:

Number of used cores has changed to ... in order to avoid CPU overloading.

For example, I followed a demonstration similar to the posted guide and vignette:

# HTML report generation and parallel processing do not work 
# Packages needed
if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install(c("edgeR", "EGSEA"))

library(edgeR)
library(EGSEA)

# Load and read in example voom object and contrast matrix from wd()
# https://github.com/nztao/EGSEA_issue_example-/blob/main/egsea_issue_example.RData
load("egsea_issue_example.RData")

# EGSEA Configuration
# Example: Contrast Matrix
example.cf.matrix <- cf.matrices.combined.inter$BPA$'10'

# Example: buildIdx uses Entrez IDs for Hallmark Collection and KEGG pathways
example.gs.annots <-  buildIdx(entrezIDs= rownames(komen.voomqw.egsea$E),
                          species="human", 
                          msigdb.gsets= "h",
                          kegg.updated = T)

# Example: base methods
example.egsea.base <- egsea.base()[c(1,2,12)]

# Ensemble testing with EGSEA
example.egsea.issue<- egsea(voom.results = komen.voomqw.egsea, 
        contrasts = example.cf.matrix,
        gs.annots= example.gs.annots, 
        baseGSEAs = example.egsea.base, 
        sort.by="med.rank",  
        symbolsMap= komen.voomqw.egsea$genes,
        #Issue 1: Parallel substantially underutilizes threads
        num.threads = 16,
        verbose = T,
        #Issue 2: HTML 'report = T' and 'generateReport()' give the same error:
        # Error in array(col.rgb[, i], dim(node.rgb)[3:1]) :
        # negative length vectors are not allowed 
        report = T, report.dir = "./example_report_issue")

I am wondering if it's an issue with my data, but I am at a loss at where to diagnose and how to fix it. I tried troubleshooting and verifying for these issues such as:

Issue 1: Error in array(col.rgb[, i], dim(node.rgb)[3:1]) : negative length vectors are not allowed

  1. any(is.na(komen.voomqw.egsea$genes$FeatureID)),
  2. any(is.na(komen.voomqw.egsea$genes$Symbols)),
  3. symbolsMap= c("FeatureID", "Symbols") in the colnames of komen.voomqw.egsea$genes[,1:2]
  4. Trying at least 3 or more base methods and different combinations of kegg and MSigDB gs.annots

Issue 2: Number of used cores has changed to ... in order to avoid CPU overloading.

  1. Using different processors and contrast matrices (i.e. i9-9900K, 16 cores changes to 4 when processing all the contrast matrices; AMD Ryzen Threadripper PRO 3975WX 32-Cores, 60 cores changes to 11 when processing all the contrast matrices, only 2-5% CPU is utilized)

My apologies if these are two separate issues, the lack of parallelization and report issues both contribute to significant time losses. Any suggestions?

Thank you!

Here is my sessioninfo:

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] EGSEA_1.26.0         pathview_1.38.0      topGO_2.50.0         SparseM_1.81         GO.db_3.16.0        
 [6] graph_1.76.0         AnnotationDbi_1.60.2 IRanges_2.32.0       S4Vectors_0.36.1     gage_2.48.0         
[11] Biobase_2.58.0       BiocGenerics_0.44.0  edgeR_3.40.2         limma_3.54.1        

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                  tidyselect_1.2.0            RSQLite_2.3.0               htmlwidgets_1.6.2          
  [5] grid_4.2.2                  BiocParallel_1.32.5         R2HTML_2.3.3                munsell_0.5.0              
  [9] ScaledMatrix_1.6.0          codetools_0.2-19            mutoss_0.1-13               DT_0.27                    
 [13] KEGGdzPathwaysGEO_1.36.0    colorspace_2.1-0            knitr_1.42                  rstudioapi_0.14            
 [17] SingleCellExperiment_1.20.1 MatrixGenerics_1.10.0       Rdpack_2.4                  KEGGgraph_1.58.3           
 [21] BiocSet_1.12.1              org.Rn.eg.db_3.16.0         GenomeInfoDbData_1.2.9      mnormt_2.1.1               
 [25] hwriter_1.3.2.1             bit64_4.0.5                 rhdf5_2.42.1                vctrs_0.5.2                
 [29] generics_0.1.3              TH.data_1.1-1               xfun_0.37                   R6_2.5.1                   
 [33] doParallel_1.0.17           GenomeInfoDb_1.34.9         clue_0.3-64                 rsvd_1.0.5                 
 [37] locfit_1.5-9.7              bitops_1.0-7                rhdf5filters_1.10.1         cachem_1.0.6               
 [41] DelayedArray_0.23.0         BiocIO_1.8.0                scales_1.2.1                multcomp_1.4-23            
 [45] gtable_0.3.3                beachmat_2.14.2             org.Mm.eg.db_3.16.0         sandwich_3.0-2             
 [49] rlang_1.1.0                 GlobalOptions_0.1.2         splines_4.2.2               lazyeval_0.2.2             
 [53] PADOG_1.40.0                checkmate_2.1.0             yaml_2.3.7                  backports_1.4.1            
 [57] tools_4.2.2                 ggplot2_3.4.2               gplots_3.1.3                RColorBrewer_1.1-3         
 [61] HTMLUtils_0.1.8             sparrow_1.4.0               TFisher_0.2.0               Rcpp_1.0.10                
 [65] plyr_1.8.8                  sparseMatrixStats_1.10.0    zlibbioc_1.44.0             purrr_1.0.1                
 [69] RCurl_1.98-1.10             GetoptLong_1.0.5            viridis_0.6.2               hgu133plus2.db_3.13.0      
 [73] zoo_1.8-11                  SummarizedExperiment_1.28.0 cluster_2.1.4               magrittr_2.0.3             
 [77] data.table_1.14.8           circlize_0.4.15             mvtnorm_1.1-3               matrixStats_0.63.0         
 [81] evaluate_0.20               GSVA_1.46.0                 xtable_1.8-4                globaltest_5.52.1          
 [85] XML_3.99-0.13               hgu133a.db_3.13.0           gridExtra_2.3               EGSEAdata_1.26.0           
 [89] shape_1.4.6                 compiler_4.2.2              safe_3.38.0                 tibble_3.1.8               
 [93] KernSmooth_2.23-20          crayon_1.5.2                htmltools_0.5.4             tidyr_1.3.0                
 [97] DBI_1.1.3                   ComplexHeatmap_2.14.0       MASS_7.3-58.3               babelgene_22.9             
[101] Matrix_1.5-3                cli_3.6.0                   rbibutils_2.2.13            parallel_4.2.2             
[105] metap_1.8                   qqconf_1.3.1                GenomicRanges_1.50.2        pkgconfig_2.0.3            
[109] sn_2.1.1                    numDeriv_2016.8-1.1         plotly_4.10.1               foreach_1.5.2              
[113] annotate_1.76.0             rngtools_1.5.2              multtest_2.54.0             XVector_0.38.0             
[117] doRNG_1.8.6                 digest_0.6.29               Biostrings_2.66.0           rmarkdown_2.21             
[121] DelayedMatrixStats_1.20.0   GSEABase_1.60.0             curl_5.0.0                  gtools_3.9.4               
[125] rjson_0.2.21                GSA_1.03.2                  lifecycle_1.0.3             nlme_3.1-162               
[129] jsonlite_1.8.4              Rhdf5lib_1.20.0             viridisLite_0.4.1           fansi_1.0.4                
[133] pillar_1.9.0                ontologyIndex_2.10          lattice_0.20-45             KEGGREST_1.38.0            
[137] fastmap_1.1.0               httr_1.4.5                  plotrix_3.8-2               survival_3.5-5             
[141] glue_1.6.2                  png_0.1-8                   iterators_1.0.14            bit_4.0.5                  
[145] Rgraphviz_2.42.0            stringi_1.7.12              HDF5Array_1.26.0            blob_1.2.4                 
[149] org.Hs.eg.db_3.16.0         BiocSingular_1.14.0         caTools_1.18.2              memoise_2.0.1              
[153] mathjaxr_1.6-0              dplyr_1.1.0                 irlba_2.3.5.1            
egsea EGSEA123 • 1.0k views
ADD COMMENT
0
Entering edit mode

Other cf.matrices.combined.inter contrast matrices in the example above also seem to give the same issue (i.e. cf.matrices.combined.inter$Sodium_Arsenite$'10'). I'm wondering if it has something to with problematic KEGG pathway maps like in this thread?

For instance, this error appears in the EGSEA warnings() of Verbose = T for contrasts = cf.matrices.combined.inter$DDE$10:

8: In download.file(xml.url, xml.target, quiet = T) :
  URL 'https://rest.kegg.jp/get/hsa00053/kgml': status was 'SSL connect error'

On a separate note, for the parallelization issue, the future.apply package helps with the processing time for a list of contrast matrices.

ADD REPLY

Login before adding your answer.

Traffic: 537 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6