BiocParallel and CopywriteR Error
2
1
Entering edit mode
genomic8328 ▴ 10
@genomic8328-13397
Last seen 6.7 years ago

I recently tried to use CopywriteR in Microsoft Azure cloud - Windows Server Datacenter Virtual MachineĀ  (128 RAM and 16 cores) with R 3.3.2. Also my input data files: normal 12.67GB, tumor 11GB

I received the following error:
Error: 'bplapply' receive data failed: error reading from connection

Can you suggest a work around? Maybe too many bam lines are being read at once?

Here is my code:

library("CopywriteR")
library("CopyhelpeR")
setwd("C:/Users/m/Desktop/share/data")
data.folder <- tools::file_path_as_absolute(file.path(getwd()))
preCopywriteR(output.folder=file.path(data.folder), bin.size=20000, ref.genome="hg38", prefix="chr")

list.dirs(path=file.path(data.folder), full.names=FALSE)
list.files(path=file.path(data.folder, "hg38_20kb_chr"), full.names=FALSE)
load(file=file.path(data.folder, "hg38_20kb_chr", "blacklist.rda"))
blacklist.grange

load(file=file.path(data.folder, "hg38_20kb_chr", "GC_mappability.rda"))
GC.mappa.grange[1001:1011]
bp.param <- SnowParam(workers = 15, type ="SOCK")
bp.param

path <- c("C:/Users/m/Desktop/share/data")
samples <- list.files(path=path, pattern="tumor.bam$", full.names=TRUE)
controls <- list.files(path=path, pattern="normal.bam$", full.names=TRUE)
sample.control <- data.frame(samples,controls)

CopywriteR(sample.control = sample.control, destination.folder = file.path(data.folder), reference.folder = file.path(data.folder, "hg38_20kb_chr"), bp.param = bp.param)
bioconductor biocparallel copywriter • 1.9k views
ADD COMMENT
0
Entering edit mode
t.kuilman ▴ 170
@tkuilman-6868
Last seen 2.4 years ago
Netherlands

I am not sure whether this is an issue with CopywriteR; I think this might be an issue with BiocParallel (the package in which the bplapply function is specified) and/or an memory issue. I hope someone else can help with this issue.

ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 months ago
United States

My guess is that the amount of data being returned by workers is too large to be represented in a serialized vector, I think probably 2^31 - 1 elements. Maybe traceback() would help understand where things are going wrong, and using SerialParam() a work-around (though obviously thwarting parallel evaluation).

ADD COMMENT

Login before adding your answer.

Traffic: 1069 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6