Hi there,
I am currently working on my master's thesis about the influence of copy number changes on the response to a new treatment. My data consists of about 100 samples, measured with Nimblegen 2.1M CGH arrays. Each of those arrays contains about 2.1 million features. After preprocessing my data as described in the snapCGH vignette (http://www.bioconductor.org/packages/3.3/bioc/vignettes/snapCGH/inst/doc/snapCGHguide.pdf), I tried to use the function runBioHMM
to segment my data. Unfortunately, it seems like snapCGH can't handle such large datasets, since I always get the following error:
Error: segfault from C stack overflow
I already downloaded the Package Source from https://www.bioconductor.org/packages/release/bioc/html/snapCGH.html and spent quite a few hours in looking through the source code of the package, trying to locate the source of the error and I'm pretty sure it has something to do with the function runNelderMead from snapCGH/src/optimizer.c. Sadly I'm not that familiar with C programming so I decided to ask for help here.
Below you can find a minimal working example that produces the error. The function generateSegList
creates an Object similar to MA2
from the vignette mentioned above, just with more rows (corresponding to more features on the microarray). Obviously the function works for 10000 features, but does not for 1000000 features.
library("snapCGH") generateSegList <- function(nrows){ state <- rpred <- prob <- M.predicted <- dispersion <- variance <- NULL M.observed <- matrix(rnorm(2 * nrows, 0, 1), nrows, 2) colnames(M.observed) <- c("sample1", "sample2") genes <- data.frame( "Block" = rep(1L, nrows), "Row" = rep(1:sqrt(nrows), each = sqrt(nrows)), "Col" = rep(1:sqrt(nrows), sqrt(nrows)), "ID" = paste("ID", 1:nrows, sep = ""), "Name" = paste("Name", 1:nrows, sep = ""), "Position" = rep(as.integer(seq(1,2e8, , length.out = nrows/10)), 10) / 1e6, "Chr" = rep(1:10, each = nrows / 10), "Status" = rep("Probe", nrows), stringsAsFactors = FALSE) attributes(genes$Status) <- list("values" = "Probe", "col" = "black") SegList <- new("SegList") SegList$M.observed <- M.observed SegList$genes <- genes return(SegList) } TestSegList1 <- generateSegList(100*100) TestSegList2 <- generateSegList(1000*1000) SegInfo.Bio <- runBioHMM(TestSegList1) # OUTPUT # sample is 1 Chromosomes: 1 2 3 4 5 6 7 8 9 10 # sample is 2 Chromosomes: 1 2 3 4 5 6 7 8 9 10 SegInfo.Bio <- runBioHMM(TestSegList2) # OUTPUT # sample is 1 Chromosomes: 1 # Fehler: segmentation fault wegen eines C-Stack Überlaufes
Any suggestions on how to fix this error would be appreciated.
Thanks,
Tobias Schmidt
sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.3 LTS locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] snapCGH_1.40.0 DNAcopy_1.44.0 limma_3.26.3 loaded via a namespace (and not attached): [1] Rcpp_0.12.2 BiocInstaller_1.20.1 RColorBrewer_1.1-2 [4] plyr_1.8.3 tools_3.2.2 zlibbioc_1.16.0 [7] digest_0.6.8 GLAD_2.34.0 RSQLite_1.0.0 [10] annotate_1.48.0 preprocessCore_1.32.0 gtable_0.1.2 [13] lattice_0.20-33 DBI_0.3.1 parallel_3.2.2 [16] proto_0.3-10 pixmap_0.4-11 genefilter_1.52.0 [19] stringr_1.0.0 cluster_2.0.3 IRanges_2.4.3 [22] S4Vectors_0.8.3 stats4_3.2.2 multtest_2.26.0 [25] grid_3.2.2 Biobase_2.30.0 AnnotationDbi_1.32.0 [28] XML_3.98-1.3 survival_2.38-3 ggplot2_1.0.1 [31] reshape2_1.4.1 magrittr_1.5 scales_0.3.0 [34] MASS_7.3-45 splines_3.2.2 BiocGenerics_0.16.1 [37] xtable_1.8-0 strucchange_1.5-1 colorspace_1.2-6 [40] sandwich_2.3-4 aCGH_1.48.0 stringi_1.0-1 [43] affy_1.48.0 tilingArray_1.48.0 munsell_0.4.2 [46] vsn_3.38.0 zoo_1.7-12 affyio_1.40.0
Thank you very much! That solved the problem.