Question

Multithreading CopywriteR on SLURM cluster

0

Entering edit mode

smcnulty • 0

@smcnulty-9671

Last seen 9.2 years ago

Hello,

I'm trying to multi-thread CopywriteR on a slurm cluster using openmpi. My jobs seem to be launching properly and run for about 10m, but then they die prematurely.

The final message in my slurm output file looks like this:

Error in `[<-.data.frame`(`*tmp*`, , "total.properreads", value = list( :
replacement element 18 has 2 rows, need 17
Calls: CopywriteR -> [<- -> [<-.data.frame
Execution halted
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[40568,1],0]
Exit code: 1

Any ideas?

ETA:

For clarification, it seems to stop after all the *properreads.bam.bai files have been generated.

copywriter slurm openMPI • 2.4k views

ADD COMMENT • link updated 9.2 years ago by t.kuilman ▴ 170 • written 9.2 years ago by smcnulty • 0

score 0 · Answer 1 · 2016-02-18

Hi,

I have seen this error message before and it was solved by removing CopywriteR and reinstalling it using Bioconductor:

remove.packages("CopywriteR")
source("https://bioconductor.org/biocLite.R")
biocLite("CopywriteR”)

Is it true that you installed CopywriteR from GitHub? If you do that and the rest of the dependencies are installed via Bionconductor, than some of the dependencies might be 'broken' (wrong version) and you will get this error message. If a complete reinstallation (and an update of its dependencies) does not help, can you get back to me? The, the output of CopywriteR.log and the exact code to run CopywriteR would be helpful.

Thomas

score 0 · Answer 2 · 2016-02-19

I'm still working with the people who manage our cluster to determine which version of CopywriteR was loaded and to get the multi threading up and running, but I'll let you know what comes of it.

In the meantime, I'm able to run CopywriteR and CGHcall on individual BAM files, passing each BAM to a different node on the cluster. Yesterday, I processed 17 BAM files, resulting in 17 individual .igv files. Next, I used a simple R script to merge all 17 .igv files into a single .igv file and fed that to CGHcall as you'd instructed me previously (in case anyone is interested: https://support.bioconductor.org/p/77930/). I compared the results to those I'd obtained when I ran everything on my laptop (17 BAMs all at once, resulting in a single combined .igv file). I was surprised to see that the results were different. I'm pretty sure that all the settings were the same (for instance, the tumor cellularity provided w/in CGHcall, etc). Do you have any comment? Is this expected? If so, do you have a suggestion for which result is to be considered more accurate?

score 0 · Answer 3 · 2016-02-22

Hi Thomas,

So, here's the breakdown of my work so far ...

As I said before, I'm trying to process 17 BAM files, either as 1 run on my laptop or as 17 separate runs on our cluster. I'm often getting this error at the end of the run:

Total calculation time of CopywriteR was:  31.96694

Warning message:
In plot.xy(xy.coords(x, y), type = type, ...) :
  "subset" is not a graphical parameter

However, the process still generates (what looks like) a completed .igv file. The igv files generated on my laptop and on our cluster look very similar, but are not identical. I have some examples below:

laptop1: chr15 60000001 60050000 chr15:60000001-60050000 0.719653400974822

cluster1: chr15 60000001 60050000 chr15:60000001-60050000 0.719653400974823

laptop2: chr15 60700001 60750000 chr15:60700001-60750000 0.0973944022031557

cluster2: chr15 60700001 60750000 chr15:60700001-60750000 0.0973944022031558

The differences seem super, super tiny, but they seem to matter a great deal b/c I get different results out of CGHcall.

Last night I decided to make sure that all the input files were the same buy comparing md5 checksums to make sure nothing had gotten corrupted/altered in the transfers. In doing this, I realized that there must be a slight difference in the hg19 files being pulled down by PreCopywriteR. I figured this was the source of the problem until I transferred the hg19 files from my laptop to the cluster. Running CopywriteR with the laptop hg19 files still got me the same "cluster" results.

To be clear, I'm using the same versions of CopywriteR and CGHcall in both places, though my laptop is running R version 3.2.3 and our cluster is running R version 3.2.1.

I sent you an email with the code I used in each place since I'm not sure if its possible to attach it here.

score 0 · Answer 4 · 2016-02-23

0

Entering edit mode

smcnulty • 0

@smcnulty-9671

Last seen 9.2 years ago

I think this may account for part of the problem. I was processing the samples together in CGHcall, but was missing a few on my laptop. I processed 17 BAMs on the cluster but only a subset of those on the laptop.

We also found this in the CGHcall documentation (http://www.rdocumentation.org/packages/DNAcopy/html/segment.html)

"Since the segmentation procedure uses a permutation reference distribution, R commands for setting and saving seeds should be used if the user wishes to reproduce the results."

I was just wondering if this is also the case for CopywriteR. Should I be setting a seed manually there as well to make sure my results are 100% reproducible? This will be very important for me going forward.

ADD COMMENT • link 9.2 years ago smcnulty • 0

0

Entering edit mode

No, it is not required to set a seed for CopywriteR; on the same system using the same input I always get the same end result (as I said, the minute differences in the output you noted before should be a result of the fact that you run CopywriteR on 2 different systems). I hope this solves the issue, otherwise please let me know.

ADD REPLY • link 9.1 years ago t.kuilman ▴ 170