analyzeChr seems to just freeze during f-stat calculation. "top" tells me that R is still running but using 0% of processor, so I don't think it's doing much!
My code:
library(derfinder)
library(GenomicFeatures)
library(BiocParallel)
cores <- 10
perms <- 1
fstat <- 10
options(species = 'canis_familiaris')
chroms <- c(1:38, 'X')
files <- rawFiles(datadir="~/transcriptome/pituitary/aligned/cf3-N3-ensembl/", samplepatt="tophat-N3-ensembl-*")
tameIds <- read.table("/home/hekman2/Dropbox/Research/transcriptome/pituitary/notes/samples-pit-tame.txt")
tameFileNames <- lapply(tameIds, function(id) { return(paste("tophat-N3-ensembl-", id, sep="")) } )
groups <- as.factor(sapply(names(files), function(filename) { ifelse(grepl(filename, tameFileNames), "tame", "aggr") }))
fullCov <- fullCoverage(files, chrs = chroms)
sampleDepths <- sampleDepth(collapseFullCoverage(fullCov), 1) # only worthwhile if we analyzed all chromosomes
models <- makeModels(sampleDepths, testvars = groups)
filteredCov <- lapply(fullCov, filterData, cutoff = 2)
dogTx <- makeTranscriptDbFromBiomart(biomart="ensembl",
dataset="cfamiliaris_gene_ensembl",
transcript_ids=NULL,
circ_seqs=DEFAULT_CIRC_SEQS,
filters="",
id_prefix="ensembl_",
host="www.biomart.org",
port=80,
miRBaseBuild=NA)
originalWd <- getwd()
setwd(file.path(originalWd, derResults))
currentChrom <- "chr1"
analyzeChr(chr = currentChrom, filteredCov[[currentChrom]], models, mc.cores=cores, groupInfo = groups, cutoffFstat = fstat, cutoffType = 'manual', nPermute = perms, seeds = 19731107 + seq_len(perms), lowMemDir = file.path(tempdir(), currentChrom), BPPARAM.custom = MulticoreParam(workers = cores), verbose = TRUE, runAnnotation = TRUE, writeOutput = TRUE, returnOutput = FALSE, txdb=dogTx)
Relevant output:
2015-03-23 16:51:51 analyzeChr: Pre-processing the coverage data 2015-03-23 16:52:03 preprocessCoverage: splitting the data 2015-03-23 16:53:24 analyzeChr: Calculating statistics 2015-03-23 16:53:24 calculateStats: calculating the F-statistics
R version:
R Under development (unstable) (2015-03-19 r68032) -- "Unsuffered Consequences" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit)
Session info:
> devtools::session_info() Session info ------------------------------------------------------------------- setting value version R Under development (unstable) (2015-03-19 r68032) system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 tz America/Chicago Packages ----------------------------------------------------------------------- package * version date source acepack * 1.3-3.3 2014-11-24 CRAN (R 3.3.0) AnnotationDbi * 1.29.20 2015-03-23 Bioconductor Biobase * 2.27.2 2015-03-23 Bioconductor BiocGenerics * 0.13.8 2015-03-23 Bioconductor BiocParallel * 1.1.19 2015-03-23 Bioconductor biomaRt * 2.23.5 2015-03-23 Bioconductor Biostrings * 2.35.11 2015-03-23 Bioconductor bitops * 1.0-6 2013-08-17 CRAN (R 3.3.0) bumphunter * 1.7.6 2015-03-23 Bioconductor cluster * 2.0.1 2015-01-31 CRAN (R 3.3.0) codetools * 0.2-11 2015-03-10 CRAN (R 3.3.0) colorspace * 1.2-6 2015-03-11 CRAN (R 3.3.0) DBI * 0.3.1 2014-09-24 CRAN (R 3.3.0) derfinder 1.1.17 2015-03-23 Bioconductor derfinderHelper * 1.1.6 2015-03-23 Bioconductor devtools 1.7.0 2015-01-17 CRAN (R 3.3.0) digest * 0.6.8 2014-12-31 CRAN (R 3.3.0) doRNG * 1.6 2014-03-07 CRAN (R 3.3.0) foreach * 1.4.2 2014-04-11 CRAN (R 3.3.0) foreign * 0.8-63 2015-02-20 CRAN (R 3.3.0) Formula * 1.2-0 2015-01-20 CRAN (R 3.3.0) futile.logger * 1.4 2015-03-21 CRAN (R 3.3.0) futile.options * 1.0.0 2010-04-06 CRAN (R 3.3.0) GenomeInfoDb * 1.3.15 2015-03-23 Bioconductor GenomicAlignments * 1.3.32 2015-03-23 Bioconductor GenomicFeatures * 1.19.33 2015-03-23 Bioconductor GenomicFiles * 1.3.14 2015-03-23 Bioconductor GenomicRanges * 1.19.47 2015-03-23 Bioconductor ggplot2 * 1.0.1 2015-03-17 CRAN (R 3.3.0) gtable * 0.1.2 2012-12-05 CRAN (R 3.3.0) Hmisc * 3.15-0 2015-02-16 CRAN (R 3.3.0) IRanges * 2.1.43 2015-03-23 Bioconductor iterators * 1.0.7 2014-04-11 CRAN (R 3.3.0) lambda.r * 1.1.7 2015-03-20 CRAN (R 3.3.0) lattice * 0.20-30 2015-02-22 CRAN (R 3.3.0) latticeExtra * 0.6-26 2013-08-15 CRAN (R 3.3.0) locfit * 1.5-9.1 2013-04-20 CRAN (R 3.3.0) MASS * 7.3-40 2015-03-21 CRAN (R 3.3.0) Matrix * 1.1-5 2015-01-18 CRAN (R 3.3.0) matrixStats * 0.14.0 2015-02-14 CRAN (R 3.3.0) munsell * 0.4.2 2013-07-11 CRAN (R 3.3.0) nnet * 7.3-9 2015-02-11 CRAN (R 3.3.0) pkgmaker * 0.22 2014-05-14 CRAN (R 3.3.0) plyr * 1.8.1 2014-02-26 CRAN (R 3.3.0) proto * 0.3-10 2012-12-22 CRAN (R 3.3.0) qvalue * 1.43.0 2015-03-23 Bioconductor RColorBrewer * 1.1-2 2014-12-07 CRAN (R 3.3.0) Rcpp * 0.11.5 2015-03-06 CRAN (R 3.3.0) RCurl * 1.95-4.5 2014-12-28 CRAN (R 3.3.0) registry * 0.2 2012-01-24 CRAN (R 3.3.0) reshape2 * 1.4.1 2014-12-06 CRAN (R 3.3.0) rngtools * 1.2.4 2014-03-06 CRAN (R 3.3.0) rpart * 4.1-9 2015-02-24 CRAN (R 3.3.0) Rsamtools * 1.19.47 2015-03-23 Bioconductor RSQLite * 1.0.0 2014-10-25 CRAN (R 3.3.0) rstudioapi * 0.2 2014-12-31 CRAN (R 3.3.0) rtracklayer * 1.27.9 2015-03-23 Bioconductor S4Vectors * 0.5.22 2015-03-23 Bioconductor scales * 0.2.4 2014-04-22 CRAN (R 3.3.0) stringr * 0.6.2 2012-12-06 CRAN (R 3.3.0) survival * 2.38-1 2015-02-24 CRAN (R 3.3.0) XML * 3.98-1.1 2013-06-20 CRAN (R 3.3.0) xtable * 1.7-4 2014-09-12 CRAN (R 3.3.0) XVector * 0.7.4 2015-03-23 Bioconductor zlibbioc * 1.13.2 2015-03-23 Bioconductor
Thanks for all your help in the past!
Jessica
Hi Jessica,
I haven't been able to reproduce your problem. Can you try running the following code ? It will check that the cluster type is correctly being selected and that you can get results using the example data in the package.
This is what the output should look like:
Cheers,
Leo
From your timestamps, it should proceed within a second, but instead it has frozen.
Do you remember when I had trouble with analyzeChr() before and we ended up figuring out I had to set BPPARAM.custom to get around it? This feels like that. Some weird thing with processor allocation or something.
Jessica
Hi Jessica,
Thanks for the reminder about the old problem. I found the old threads: derfinder: analyzeChr() breaks on chromosomes > 22 and bplapply() fails when BPPARAM = SnowParam() is specified
Can you try running the code from ? Thanks.
It will generate output like this:
If it doesn't work, then please create a new issue with the BiocParallel tag. You can refer them back to this thread and we'll both learn from whatever solution they find.
Cheers,
Leo
SerialParam() worked; the other two did not.
I then re-tested your derfinder code with SerialParam():
It worked beautifully. I have my workaround and will close this issue. I'll consider whether or not to open one with the BiocParallel folks, but as I recall last time they just told me to use what worked, and since I have something that works it may not be worth anyone's time for me to do that.
Thank you! I'll let you know how it goes running derfinder with my data. Fingers crossed.
Jessica
Well, if you use SerialParam() your analysis is going to take much longer because you won't be running anything in parallel. The upper bound limit is 10 times the time it took before (because you were using 10 cores). It's actually a bit better than 10x, but the point remains.
You could alternatively manually run the functions analyzeChr() runs. That would allow you to use preprocessCoverage() to generate all the chunks you need. Then, you could submit a cluster job for each chunk to run calculateStats(), then merge the results. Then do something similar with calculatePvalues().
However, the above approach is complicated and it might be much easier to (a) figure out why MulticoreParam() and SnowParam() don't work in your cluster, or (b) try to use another *Param() such as BatchJobsParam() or DoparParam().
Cheers,
Leo
In the SnowParam report from earlier one possibility was blocked ports, with the OP saying they'd check on this... Another possibility with SnowParam is that a different R is being started; you might try writing code in the worker that sends output to a file (cat("hello world\n", file="my_test.R") -- nothing fancy, and then work up to providing useful debugging information, e.g., about .libPaths(). Also it helps alot to simplify things, so doing your testing in a new R session with no unneeded packages loaded, and removing BiocParallel from the picture (at least until it becomes apparent that that is what the problem is) by calling functions from the parallel and snow packages directly. It's always helpful to have complete output, rather than 'it doesn't work'.
snow and multicore are quite different beasts, so it's interesting that they have both apparently stopped working?
Sorry for the delayed response; I hit the six posts in six hours limit and had to wait.
Here are the results. I had to control-C out of the first two bplapply() calls because they just hung.
I very much believe the problem is with a blocked port. My admin has a history of blocking ports quite aggressively. Additionally, when I run analyzeChr() with SerialParam(), I get this error:
...which also sounds like a blocked-port problem to me.
I can talk to my admin about opening up some ports -- but which ones?
Thanks for your help!
Jessica
Thanks for that so far; what about in a new session with no packages loaded
Also, if these do not work, can you try outside R-studio, just on the command line? For me these all work, and
identical(x, y)
returns TRUE.Code works for me, lets see what happens in Jessica's case. My output for local and cluster environments is shown below:
I'm curious if MulticoreParam() failing could be an issue related to Jessica using R 3.3 (devel) instead of 3.2 (alpha, matches current BioC-devel). A recent BioC-devel thread https://stat.ethz.ch/pipermail/bioc-devel/2015-March/007228.html was solved by using R 3.2 I haven't tested myself if the code in this thread works with R 3.3 since http://r.research.att.com/ doesn't have a snowleopard 3.3 version out yet (aye, I'm too lazy to compile R myself).
I just installed 3.2: R version 3.2.0 alpha (2015-03-26 r68103). In a new session with nothing loaded:
Again parallel:makeCluster() hung and I had to exit with ^C. I conclude that 3.2 vs 3.3 isn't the problem.
Jessica
(I had to control-C out of the second command, and then obviously could not continue with that set.)
I had to control-C again; it does not like parallel::makeCluster()!
None of this is in R studio (I hate graphical interfaces); it is all on the command line, where I always work. It was fresh, nothing loaded, nothing done before the code I copied above.
Jessica
Sorry for my delayed reply. I think at the end of the day Jessica you're right, the sockets R is trying to use are blocked. The initial mystery (to me) was why MulticoreParam stopped working, but it turns out that the implementation has changed and now also relies on sockets for some aspect of communication. I understand that the environment variable R_PARALLEL_PORT can be used to determine the port on which communication occurs, so if you can determine the ports that remain open, starting R with something like
R_PARALLEL_PORT=12345 R --vanilla
might do the trick. I'd asked about rstudio because it seems from your session_info that a package rstudioapi is always present...I believe that's because `devtools` imports `rstudioapi`.See https://github.com/hadley/devtools/blob/master/DESCRIPTION#L26
On a clean session:
I didn't know about R_PARALLEL_PORT. Sounds like the best lead!
That's so helpful, thanks! I'm going to be out of the country for a week starting today, but when I get back I'll sit down and try to figure this out. I'll post whatever I find here (hopefully an answer about what ports to open, for the benefit of future answer seekers).
OK! I contacted my sysadmin for help in finding an appropriate port to use with R_PARALLEL_PORT, and he said "oh, looks like loopback is blocked -- I opened it back up, see if things work now." And now they do.
So the answer is: check loopback and make sure it's open.
Thanks to both of you for all your help. Leo, derfinder completed when I did a test run with one permutation. I'm doing a real run now with 200 perms, and if/when it completes I'll let you know.
Jessica
Awesome! Good luck with the rest of the analysis.