Dear All,
I'm trying to analizesome ATAC-seq samples using the ATACseqQC package and I am unable to make it work correctly.
The proccess always crashes when I reach the same part of the workflow shown in the vignette this function
fragSize <- fragSizeDist(bamfile, bamfile.labels)
After a while running a memory overflow happens and the proccess gets killed.
I'm trying to analyze 100bp Illumina Paired-End samples aligned with Bowtie2. The de-duplicated Bamfiles size is around 3,5 GB and the memory overflow occurs at 84 GB.
Is there any minimmum RAM requirement specifiedfor ATACseqQC that I overlooked? Or is it just that this particular package uses huge amounts of memory anyway?
Does anybody know af an alternative way to carry out this kind of analysis (ATACseq) that render some quality Heatmap and coverage curve for nucleosome positions?
Here is my code (up to the step it works):
library(ATACseqQC) library(ChIPpeakAnno) library(BSgenome.Hsapiens.UCSC.hg19) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(phastCons100way.UCSC.hg19) library(MotifDb) knitr::opts_chunk$set(warning=FALSE, message=FALSE) setwd("/path_to_filees/BAM_dedup/") file.list <- as.list(list.files(".", pattern = "*.bam$")) file.list bamfile <-file.list[[1]] bamfile.labels <- gsub(".bam", "", basename(bamfile)) bamQC(bamfile, outPath=NULL) ## generate fragement size distribution fragSize <- fragSizeDist(bamfile, bamfile.labels)
And my Session Info:
> sessionInfo() R version 3.4.2 (2017-09-28) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Scientific Linux 7.4 (Nitrogen) Matrix products: default BLAS: /opt/R/R-3.4.2/lib/libRblas.so LAPACK: /opt/R/R-3.4.2/lib/libRlapack.so locale: [1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C [3] LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 [5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=es_ES.UTF-8 [7] LC_PAPER=es_ES.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats4 parallel stats graphics grDevices utils [8] datasets methods base other attached packages: [1] MotifDb_1.20.0 [2] phastCons100way.UCSC.hg19_3.6.0 [3] GenomicScores_1.2.1 [4] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [5] GenomicFeatures_1.30.0 [6] AnnotationDbi_1.40.0 [7] Biobase_2.38.0 [8] BSgenome.Hsapiens.UCSC.hg19_1.4.0 [9] BSgenome_1.46.0 [10] rtracklayer_1.38.2 [11] ChIPpeakAnno_3.12.4 [12] VennDiagram_1.6.18 [13] futile.logger_1.4.3 [14] GenomicRanges_1.30.1 [15] GenomeInfoDb_1.14.0 [16] Biostrings_2.46.0 [17] XVector_0.18.0 [18] IRanges_2.12.0 [19] ATACseqQC_1.2.7 [20] S4Vectors_0.16.0 [21] BiocGenerics_0.24.0 loaded via a namespace (and not attached): [1] ProtGenerics_1.10.0 bitops_1.0-6 [3] matrixStats_0.52.2 bit64_0.9-7 [5] progress_1.1.2 httr_1.3.1 [7] tools_3.4.2 R6_2.2.2 [9] rGADEM_2.26.0 splitstackshape_1.4.2 [11] seqLogo_1.44.0 DBI_0.7 [13] lazyeval_0.2.1 colorspace_1.3-2 [15] ade4_1.7-10 motifStack_1.22.0 [17] prettyunits_1.0.2 RMySQL_0.10.13 [19] bit_1.1-12 curl_3.1 [21] compiler_3.4.2 graph_1.56.0 [23] DelayedArray_0.4.1 grImport_0.9-0 [25] scales_0.5.0 randomForest_4.6-12 [27] RBGL_1.54.0 stringr_1.2.0 [29] digest_0.6.12 Rsamtools_1.30.0 [31] pkgconfig_2.0.1 htmltools_0.3.6 [33] ensembldb_2.2.0 limma_3.34.0 [35] regioneR_1.10.0 htmlwidgets_0.9 [37] rlang_0.1.4 RSQLite_2.0 [39] BiocInstaller_1.28.0 shiny_1.0.5 [41] BiocParallel_1.12.0 RCurl_1.95-4.8 [43] magrittr_1.5 GO.db_3.5.0 [45] GenomeInfoDbData_1.0.0 Matrix_1.2-11 [47] Rcpp_0.12.13 munsell_0.4.3 [49] stringi_1.1.6 yaml_2.1.16 [51] MASS_7.3-47 SummarizedExperiment_1.8.1 [53] zlibbioc_1.24.0 plyr_1.8.4 [55] AnnotationHub_2.10.1 blob_1.1.0 [57] lattice_0.20-35 splines_3.4.2 [59] multtest_2.34.0 MotIV_1.34.0 [61] seqinr_3.4-5 biomaRt_2.34.1 [63] futile.options_1.0.0 XML_3.98-1.9 [65] data.table_1.10.4-3 lambda.r_1.2 [67] idr_1.2 httpuv_1.3.5 [69] assertthat_0.2.0 mime_0.5 [71] xtable_1.8-2 AnnotationFilter_1.2.0 [73] survival_2.41-3 tibble_1.3.4 [75] GenomicAlignments_1.14.1 memoise_1.1.0 [77] interactiveDisplayBase_1.16.0
Thanks in advance!
JL
Hi JL,
Thank you for trying ATACseqQC. Please have a try to remove bamQC from you current script. And have a try. I'm working on bamQC to improve this issue.
Jianhong.
Dear Jianhong,
I have tried the following script (the bamQC command actually worked). I tried your suggestion using only a couple of chromosomes (e.g. 1 and 2) and it crashes at the a point a little bit further...
SessionInfo is the same than the previous but now I am using 96 GB of RAM
Any thoughts/Suggestions or update of the package ?
Thanks in advance for your kind help
Best
JL
Getting also process killed when reading huge bam file ~ 9G with readBamFile function.
One of the developers, Jianhong, will work on making it work with big bam files. For now, please try to generate the plots using a couple of chromosomes from your BAM files. Thanks!
Best regards,
Julie
Thanks to Jianhong, the memory issue has be fixed in the development version.
Here are the scripts to install the dev version of ATACseqQC.
library(devtools); install_github("jianhong/ATACseqQC")
Please set bigFile=TRUE when using readBamFile. Thanks!
Best regards,
Julie