Hi everyone,
I've written a script to compute p-values based on model estimates from a hurdle regression using pscl package. I'm using the predict() function from the stats package to get estimated mean and zero probabilities need to calculate the p-values. the features are stored in big.memory matrix x which is on the order 1e6 observations. I'm running on 3.1.1 on torque scheduler, and my job is getting killed because I'm exceeding the memory limit. How can I reduce the memory footprint without having to increase the physical memory?
func <- function(counts,size,mean,phat) mapply(p.zanegbin, q=counts, size=size, munb=mean, pobs=phat) time <- system.time({ mu <- predict(fit, newdata = as.data.frame(x[,]), dispersion = fit$theta**(-1)) phat <- predprob(fit, newdata=as.data.frame(x[,]))[,1] idx <- chunk(seq_len(nrows),cores) pvals <- foreach(i=1:cores,.combine=c) %dopar% { func(x[idx[[i]],3], fit$theta, mu[idx[[i]]], phat[idx[[i]]]) } adjusted <- p.adjust(pvals,method='fdr') })[3]/60
=>> PBS: job killed: pvmem exceeded limit 6442450944
Terminated
-bash-4.1$ show(sprintf('Time required to estimate additional parameters : %3.2f mins',stime))
-bash: syntax error near unexpected token `sprintf'
-bash-4.1$
-bash-4.1$
qsub: job 2943092.mskcc-fe1.local completed
bash-4.1$