A problem with csaw that I don't think is a problem with csaw
2
0
Entering edit mode
sorjuelal • 0
@sorjuelal-18470
Last seen 4.7 years ago

Hello,

I updated to R 3.5.1 on an ubuntu 16.04.5 LTS machine. I am using bioconductor 3.8 packages. Once I run windowCounts() from csaw I get a *** caught segfault *** address 0x7, cause 'memory not mapped'.

Here is example code using a bam file from the csaw package:

> library(csaw)
> pe.bam <- system.file("exdata", "pet.bam", package = "csaw")
> pe.param <- readParam(max.frag=400, pe="both")
> demo <- windowCounts(pe.bam, ext=250, param=pe.param)

 *** caught segfault ***
address 0x7, cause 'memory not mapped'

Traceback:
 1: .getPairedEnd(bam.file, where = where, param = param)
 2: (function (bam.file, where, param, init.ext, final.ext, outlen,     bin, shift, width, spacing, total.pts, at.start) {    if (param$pe != "both") {        reads <- .getSingleEnd(bam.file, where = where, param = param)        extended <- .extendSE(reads, ext = init.ext, final = final.ext,             chrlen = outlen)        frag.start <- extended$start        frag.end <- extended$end        rlengths <- cbind(c(mean(reads$forward$qwidth), mean(reads$reverse$qwidth)),             c(length(reads$forward$qwidth), length(reads$reverse$qwidth)))    }    else {        out <- .getPairedEnd(bam.file, where = where, param = param)        if (bin) {            mid <- as.integer(out$pos + out$size/2)            mid <- pmin(mid, outlen)            frag.end <- frag.start <- mid        }        else {            checked <- .coerceFragments(out$pos, out$pos + out$size -                 1L, final = final.ext, chrlen = outlen)            frag.start <- checked$start            frag.end <- checked$end        }        rlengths <- c(mean(out$size), length(out$size))    }    right <- width - shift - 1L    left <- shift    frag.start <- frag.start - right    frag.end <- frag.end + left    out <- .Call(cxx_get_rle_counts, frag.start, frag.end, total.pts,         spacing, at.start)    return(list(counts = out, totals = length(frag.start), lengths = rlengths))})(bam.file = dots[[1L]][[1L]], init.ext = dots[[2L]][[1L]],     where = new("GRanges", seqnames = new("Rle", values = 1L,         lengths = 1L, elementMetadata = NULL, metadata = list()),         ranges = new("IRanges", start = 1L, width = 200L, NAMES = NULL,             elementType = "ANY", elementMetadata = NULL, metadata = list()),         strand = new("Rle", values = 3L, lengths = 1L, elementMetadata = NULL,             metadata = list()), seqinfo = new("Seqinfo", seqnames = "chrA",             seqlengths = NA_integer_, is_circular = NA, genome = NA_character_),         elementMetadata = new("DataFrame", rownames = NULL, nrows = 1L,             listData = list(), elementType = "ANY", elementMetadata = NULL,             metadata = list()), elementType = "ANY", metadata = list()),     param = new("readParam", pe = "both", max.frag = 400L, dedup = FALSE,         minq = NA_integer_, forward = NA, restrict = character(0),         discard = new("GRanges", seqnames = new("Rle", values = integer(0),             lengths = integer(0), elementMetadata = NULL, metadata = list()),             ranges = new("IRanges", start = integer(0), width = integer(0),                 NAMES = NULL, elementType = "ANY", elementMetadata = NULL,                 metadata = list()), strand = new("Rle", values = integer(0),                 lengths = integer(0), elementMetadata = NULL,                 metadata = list()), seqinfo = new("Seqinfo",                 seqnames = character(0), seqlengths = integer(0),                 is_circular = logical(0), genome = character(0)),             elementMetadata = new("DataFrame", rownames = NULL,                 nrows = 0L, listData = list(), elementType = "ANY",                 elementMetadata = NULL, metadata = list()), elementType = "ANY",             metadata = list()), BPPARAM = new("SerialParam",             .xData = <environment>)), final.ext = NA_integer_,     outlen = c(chrA = 200L), bin = FALSE, shift = 0L, width = 50L,     spacing = 50L, total.pts = 4L, at.start = TRUE)
 3: .mapply(.FUN, dots, .MoreArgs)
 4: FUN(...)
 5: doTryCatch(return(expr), name, parentenv, handler)
 6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 7: tryCatchList(expr, classes, parentenv, handlers)
 8: tryCatch({    FUN(...)}, error = handle_error)
 9: withCallingHandlers({    tryCatch({        FUN(...)    }, error = handle_error)}, warning = handle_warning)
10: FUN(...)
11: FUN(X[[i]], ...)
12: lapply(X, FUN_, ...)
13: bplapply(X = seq_along(ddd[[1L]]), wrap, .FUN = FUN, .ddd = ddd,     .MoreArgs = MoreArgs, BPREDO = BPREDO, BPPARAM = BPPARAM)
14: bplapply(X = seq_along(ddd[[1L]]), wrap, .FUN = FUN, .ddd = ddd,     .MoreArgs = MoreArgs, BPREDO = BPREDO, BPPARAM = BPPARAM)
15: bpmapply(FUN = .window_counts, bam.file = bam.files, init.ext = ext.data$ext,     MoreArgs = list(where = where, param = param, final.ext = ext.data$final,         outlen = outlen, bin = bin, shift = shift, width = width,         spacing = spacing, total.pts = total.pts, at.start = at.start),     BPPARAM = param$BPPARAM, SIMPLIFY = FALSE)
16: bpmapply(FUN = .window_counts, bam.file = bam.files, init.ext = ext.data$ext,     MoreArgs = list(where = where, param = param, final.ext = ext.data$final,         outlen = outlen, bin = bin, shift = shift, width = width,         spacing = spacing, total.pts = total.pts, at.start = at.start),     BPPARAM = param$BPPARAM, SIMPLIFY = FALSE)
17: windowCounts(pe.bam, ext = 250, param = pe.param)
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)

Here is my session info:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.5 LTS

Matrix products: default
BLAS: /usr/local/R/R-3.5.1/lib/libRblas.so
LAPACK: /usr/local/R/R-3.5.1/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8    LC_NUMERIC=C            LC_TIME=C              
 [4] LC_COLLATE=en_CA.UTF-8  LC_MONETARY=C           LC_MESSAGES=en_CA.UTF-8
 [7] LC_PAPER=C              LC_NAME=C               LC_ADDRESS=C           
[10] LC_TELEPHONE=C          LC_MEASUREMENT=C        LC_IDENTIFICATION=C    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] csaw_1.16.0                 SummarizedExperiment_1.12.0
 [3] DelayedArray_0.8.0          BiocParallel_1.16.1        
 [5] matrixStats_0.54.0          Biobase_2.42.0             
 [7] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
 [9] IRanges_2.16.0              S4Vectors_0.20.1           
[11] BiocGenerics_0.28.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0               compiler_3.5.1           XVector_0.22.0          
 [4] prettyunits_1.0.2        GenomicFeatures_1.34.1   bitops_1.0-6            
 [7] tools_3.5.1              zlibbioc_1.28.0          progress_1.2.0          
[10] biomaRt_2.38.0           digest_0.6.18            bit_1.1-14              
[13] RSQLite_2.1.1            memoise_1.1.0            lattice_0.20-38         
[16] pkgconfig_2.0.2          rlang_0.3.0.1            Matrix_1.2-15           
[19] DBI_1.0.0                GenomeInfoDbData_1.2.0   rtracklayer_1.42.1      
[22] httr_1.3.1               stringr_1.3.1            hms_0.4.2               
[25] Biostrings_2.50.1        locfit_1.5-9.1           bit64_0.9-7             
[28] grid_3.5.1               R6_2.3.0                 AnnotationDbi_1.44.0    
[31] XML_3.98-1.16            limma_3.38.2             edgeR_3.24.0            
[34] magrittr_1.5             blob_1.1.1               GenomicAlignments_1.18.0
[37] Rsamtools_1.34.0         assertthat_0.2.0         stringi_1.2.4           
[40] RCurl_1.95-4.11          crayon_1.3.4

Any input is appreciated,

Stephany  

csaw caught segfault • 1.6k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 5 hours ago
The city by the bay

That does not look good. It seems like a problem with some of the C++ code for handling paired-end reads. The question is whether this is due to code that I wrote or whether it is due to the HTS library. Can you:

  1. Copy the above example code into a separate R script, e.g., blah.R, and
  2. From the command line, run R CMD BATCH -d valgrind --no-save blah.R, and
  3. Show the output of the resulting blah.Rout file?

You should already have valgrind installed if you're on an Ubuntu system. 

ADD COMMENT
0
Entering edit mode
sorjuelal • 0
@sorjuelal-18470
Last seen 4.7 years ago

The first half:

> pe.bam <- system.file("exdata", "pet.bam", package = "csaw")
> pe.param <- readParam(max.frag=400, pe="both")
> demo <- windowCounts(pe.bam, ext=250, param=pe.param)
==46718== Invalid read of size 4
==46718==    at 0x42A52870: bam_cigar2rlen (sam.c:331)
==46718==    by 0x427FBB7E: BamRead::extract_data(AlignData&) const (bam_utils.cpp:54)
==46718==    by 0x42817D31: extract_pair_data (pair_reads.cpp:129)
==46718==    by 0x4F2D709: R_doDotCall (dotcode.c:596)
==46718==    by 0x4F67A20: bcEval (eval.c:7289)
==46718==    by 0x4F7137F: Rf_eval (eval.c:624)
==46718==    by 0x4F72CE0: R_execClosure (eval.c:1773)
==46718==    by 0x4F68A16: bcEval (eval.c:6749)
==46718==    by 0x4F7137F: Rf_eval (eval.c:624)
==46718==    by 0x4F72CE0: R_execClosure (eval.c:1773)
==46718==    by 0x4F75C85: R_forceAndCall (eval.c:1840)
==46718==    by 0x4FA2E26: do_mapply (mapply.c:105)
==46718==  Address 0x7 is not stack'd, malloc'd or (recently) free'd
==46718== 

 *** caught segfault ***
address 0x7, cause 'memory not mapped'

Traceback:
 1: .getPairedEnd(bam.file, where = where, param = param)
 ...
ADD COMMENT
0
Entering edit mode

That's all I needed to know. Looks like it's a failure with parsing the CIGAR string... though I don't know how this could happen. Please contact me offline at the csaw maintainer address if you want to be involved in further debugging - I will create a modified version of the package for you to run to see whether the same error is triggered.

ADD REPLY
0
Entering edit mode

FYI if you install a debugger version of the package:

BiocManager::install("LTLA/csaw@debugger")

... the example will print the read name and cigar length up to the point that it fails. This'll tell us which read is the problem.

ADD REPLY

Login before adding your answer.

Traffic: 826 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6