I keep getting segfaults using swappedDrops()
on our linux cluster: caught segfault, cause 'memory not mapped'. It works fine on my MacBook, but we cannot find an explanation or solution on the cluster. I have not had any problems with other functions of the package. I tried a bunch of R versions and packages of DropletUtils
giving the same error, including DropletUtils_1.10.3 under R version 4.0.3 and DropletUtils_1.12.0 under R version 4.1.0. It fails with both own files and the example code from the help section. Any help is highly appreciated.
library(DropletUtils)
## simulated data
curfiles <- DropletUtils:::simSwappedMolInfo(tempfile(), nsamples=3)
out <- swappedDrops(curfiles)
*** caught segfault ***
address (nil), cause 'unknown'
*** Error in `/apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/bin/exec/R': malloc(): memory corruption: 0x00000000231a6cd0 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x82aa6)[0x2ad4259d7aa6]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x2ad4259da6fc]
/apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libR.so(R_AllocStringBuffer+0xa1)[0x2ad424d976fa]'
...
## own data
test_1 <- "/scratch/Sample_1/outs/molecule_info.h5"
test_2 <- "/scratch/Sample_2/outs/molecule_info.h5"
out <- swappedDrops(c(test_1, test_2), get.swapped = TRUE)
*** caught segfault ***
address 0x226e0000225c, cause 'memory not mapped'
Traceback:
1: find_swapped(cells, genes, umis, nreads, min.frac, get.diagnostics)
2: removeSwappedDrops(cells = cells, umis = umis, genes = genes, nreads = nreads, ref.genes = ref.genes, ...)
3: swappedDrops(c(test_1, test_2), get.swapped = TRUE)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
sessionInfo( )
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.8 (Maipo)
Matrix products: default
BLAS: /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libRblas.so
LAPACK: /apps/languages/R/4.1.0/el7/AVX512/gnu-7.3/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] DropletUtils_1.12.0 SingleCellExperiment_1.14.0
[3] SummarizedExperiment_1.22.0 Biobase_2.52.0
[5] GenomicRanges_1.44.0 GenomeInfoDb_1.28.0
[7] IRanges_2.26.0 S4Vectors_0.30.0
[9] BiocGenerics_0.38.0 MatrixGenerics_1.4.0
[11] matrixStats_0.58.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.6 edgeR_3.34.0
[3] XVector_0.32.0 zlibbioc_1.38.0
[5] BiocParallel_1.26.0 lattice_0.20-44
[7] tools_4.1.0 DelayedMatrixStats_1.14.0
[9] sparseMatrixStats_1.4.0 grid_4.1.0
[11] scuttle_1.2.0 rhdf5_2.36.0
[13] dqrng_0.3.0 R.oo_1.24.0
[15] HDF5Array_1.20.0 Matrix_1.3-3
[17] GenomeInfoDbData_1.2.6 Rhdf5lib_1.14.0
[19] R.utils_2.10.1 rhdf5filters_1.4.0
[21] bitops_1.0-7 RCurl_1.98-1.3
[23] limma_3.48.0 DelayedArray_0.18.0
[25] compiler_4.1.0 R.methodsS3_1.8.1
[27] locfit_1.5-9.4 beachmat_2.8.0
Thank you Aaron, I get this message on the terminal after running your command:
R CMD BATCH --no-save -d valgrind test.R
And this is the output written to
test.Rout
Bit of a guess, but I see
AVX512
in the path to R. The combination of that and "Unrecognised instruction" makes me wonder if you're running code compiled with AVX512 instructions on a CPU that doesn't support them. I've seen that happen on a cluster environment where code is compiled on a machine newer than some of the nodes.Maybe try re-installing the package on the cluster node before running the example code. That might help determine if that's the issue.
Thank you for your suggestion, I have forwarded it to the IT team of our cluster, let's see whether this might be helpful. But they and I freshly installed the package and got the same error. It makes me wonder why other functions like
emptyDrops()
are working just fine, something special aboutswappedDrops()
!?For an older package version DropletUtils_1.6.1 under R 3.6.2,
swappedDrops()
actually works on the cluster, not any newer ones. I tried to use the output from that function as an input toemptyDrops()
of a newer package version, which worked, but then encountered compatibility issues with generating aSingleCellExperiment
object:I guess it's rather bad practise to do that, was just intended as a workaround. I'll try staying in the older package environment longer, i.e. calling
emptyDrops()
and generating aSingleCellExperiment
object, and see whether this is then accepted further downstream in a newer environment of Bioconductor 3.12 or 3.13.Mike is probably on the money here. If you're on a cluster and the login nodes (where most interactive work is done, including installation of new packages) use a different architecture from your worker nodes (where the jobs are actually submitted) and your compilation settings include something similar to
-march=native
, it is common to see these "invalid instruction" errors.If you are not willing to remove the
-march=native
or equivalent setting, you have little choice but to recompile the affected software on the same cluster node where the job is being executed. It is not enough to reinstall it on a different node, the compilation must be done on the _exact same node_. Like, literally - start a job, re-install the packages and then actually run your actual code. This must be done every time.If Mike and I are correct, then the successful operation of other DropletUtils functions is purely down to luck. If the instruction sets are changing, then there's no guarantee that anything will work. In fact, if you look at your valgrind output, the error happens before you even get to
swappedDrops
, or before you even load DropletUtils!The other possibility is that your system has instruction sets that are too new or wacky to be recognized by your valgrind installation. In that case, I don't really know how to help you.
Thanks a lot for your feedback. Sorry, I actually forgot to mention that I tried the function also on the login node, where I compiled the packages, but it also did not work.
I pointed the IT team maintaining the cluster to your feedback and they tried a few compilation options, most of which did not work. Even the option
-mtune=generic
did not work. However, one alternative that eventually worked was using the most generic setup of not specifying any additional compilation flags and using only the R defaults.