I am attempting to create a GmapGenome index for hg19. Here is the code I am using:
library(BSgenome.Hsapiens.UCSC.hg19) gmapGenomePath <- '/data/staylo/ref/ucsc/hg19/genome/gsnap/' gmapGenomeDirectory <- GmapGenomeDirectory(gmapGenomePath, create=TRUE) hg19 <- GmapGenome(genome=Hsapiens, directory=gmapGenomeDirectory, name='hg19', create=TRUE, k=12L )
The code exits with non-0 status:
Writing 16777217 offsets to file with total of 965773859 k-mers...done
Running cat ./hg19.genomecomp | /home/staylo/R/x86_64-redhat-linux-gnu-library/3.2/gmapR/usr/bin//gmapindex -b 12 -k 12 -q 3 -d hg19 -F . -D . -P Looking for index files in directory . (offsets not compressed) Offsets file is hg19.ref123offsets Positions file is hg19.ref123positions cat: write error: Broken pipe sh: line 1: 9040 Done(1) cat ./hg19.genomecomp 9041 Segmentation fault (core dumped) | /home/staylo/R/x86_64-redhat-linux-gnu-library/3.2/gmapR/usr/bin//gmapindex -b 12 -k 12 -q 3 -d hg19 -F . -D . -P cat ./hg19.genomecomp | /home/staylo/R/x86_64-redhat-linux-gnu-library/3.2/gmapR/usr/bin//gmapindex -b 12 -k 12 -q 3 -d hg19 -F . -D . -P failed with return code 35584 at /home/staylo/R/x86_64-redhat-linux-gnu-library/3.2/gmapR/usr/bin/gmap_build line 259. Error in .gmap_build(db = genome(genome), dir = path(directory(genome)), : system call returned a non-0 status: /home/staylo/R/x86_64-redhat-linux-gnu-library/3.2/gmapR/usr/bin/gmap_build --db=hg19 --dir=/data/staylo/ref/ucsc/hg19/genome/gsnap --kmer=12 --sort=none --circular=chrM -B /home/staylo/R/x86_64-redhat-linux-gnu-library/3.2/gmapR/usr/bin/ /tmp/RtmpsJnTvJ/gmap_build_fasta1efb63a8be94
I believe this is because it has filled up my tmp directory which is currently only 5GB on the system I am using. If I index something smaller, say chrX or even chr2, the code completes. I am currently working with our IT to increase the size of tmp. But in the mean time, I was wondering if there was an argument I could pass that would allow me to specify a different location for tmp, for instance to a larger data partition.
Thanks,
Sean
> sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.38.0 rtracklayer_1.30.1 dplyr_0.4.3 [5] Rbowtie_1.10.0 ShortRead_1.28.0 GenomicAlignments_1.6.1 BiocParallel_1.4.3 [9] sangerseqR_1.6.0 seqTools_1.4.1 zlibbioc_1.16.0 sagenhaft_1.40.0 [13] SparseM_1.7 gmapR_1.12.0 VariantAnnotation_1.16.4 Rsamtools_1.22.0 [17] SummarizedExperiment_1.0.1 Biobase_2.30.0 GenomicRanges_1.22.2 GenomeInfoDb_1.6.1 [21] msa_1.2.1 Biostrings_2.38.2 XVector_0.10.0 IRanges_2.4.6 [25] S4Vectors_0.8.5 BiocGenerics_0.16.1 loaded via a namespace (and not attached): [1] Rcpp_0.12.2 RColorBrewer_1.1-2 futile.logger_1.4.1 GenomicFeatures_1.22.7 bitops_1.0-6 futile.options_1.0.0 [7] tools_3.2.2 biomaRt_2.26.1 digest_0.6.8 lattice_0.20-33 RSQLite_1.0.0 shiny_0.12.2 [13] DBI_0.3.1 hwriter_1.3.2 grid_3.2.2 R6_2.1.1 AnnotationDbi_1.32.3 XML_3.98-1.3 [19] latticeExtra_0.6-26 magrittr_1.5 lambda.r_1.1.7 htmltools_0.2.6 assertthat_0.1 xtable_1.8-0 [25] mime_0.4 httpuv_1.3.3 RCurl_1.95-4.7
Thought of that and checked with my IT support, but that environment variable is shared globally by the other users, so it would reroute more than I bargained for. If that is the only solution, then I can work with our IT support to setup a bigger tmp location. I was hoping there might be a software solution where I could specify it as a one-off for just this command.
You can set that variable at the process level with
Sys.setenv()
.Tried that too and kept getting the old tmp directory. I guess I will have to wait for IT to build a bigger tmp.
Sorry, it looks like you need to set that when R is started. So just start R at the shell with:
Aha! I had tried that as well with no luck, but I think I may have been pointing it to a directory that was not writable. Once I pointed it to a valid location it started to work and I was able to complete the indexing. Thanks for the help!