Hi, I generated some input data with Bismark, with the following code:
bismark_methylation_extractor -p --scaffolds --bedGraph --buffer_size 40G --genome_folder /home/juaguila/Bismark/ --cytosine_report --parallel 10 ${base}.deduplicated.bam
I want to identify differentially methylated region with dmrseq, and I tested a few files if they could be read by bsseq properly.
files <- c("V00001.deduplicated.bismark.cov.gz", "V00750.deduplicated.bismark.cov.gz",
"V00796.deduplicated.bismark.cov.gz")
samples <- c("V00001", "V00750", "V00796")
bismarkBSseq <- read.bismark(files = files,
rmZeroCov = TRUE,
strandCollapse = TRUE,
verbose = TRUE,
nThread = 10L)
bismarkBSseq
saveRDS(bismarkBSseq, file = "dmrseq_LH.meth.rds")
I was told to save every output, since the code can fail so I used saveRDS, which has worked before when running MethylKit.
I tried reading and creating the matrices with my files, it was supposedly working , but no output was generated. Everything loaded well, but then, it didn't finish generating the matrices.
[read.bismark] Parsing files and constructing valid loci ...
Done in 64.9 secs
[read.bismark] Parsing files and constructing 'M' and 'Cov' matrices ...
The files look okay:
gzip -cd V00864.deduplicated.bismark.cov.gz | head
NW_022882922.1 4856 4856 0 0 4
NW_022882922.1 4866 4866 0 0 4
NW_022882922.1 4889 4889 0 0 5
NW_022882922.1 4892 4892 0 0 5
NW_022882922.1 4904 4904 0 0 5
NW_022882922.1 4953 4953 0 0 6
NW_022882922.1 4962 4962 0 0 7
NW_022882922.1 4966 4966 0 0 6
NW_022882922.1 4976 4976 0 0 5
NW_022882922.1 4977 4977 0 0 1
I am analyzing methylation data in bumblebees and each file has about 20.M lines, though not all the sites are present in all individuals.
Any ideas why this code did not work? I am running it in a server with 500GB of Ram and 48 cores.
I really want to figure and solve out this issue, because I have 44 files, and now I just tested 6 (six) files. So, at least it should work with a small sample size.
Could you tell if my code okay? I would really appreciate if you could help figure out what is going on.
Thank you very much;
> sessionInfo( )
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /home/applications/R/4.3.1/lib64/R/lib/libRblas.so
LAPACK: /home/applications/R/4.3.1/lib64/R/lib/libRlapack.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] dmrseq_1.20.1 bsseq_1.36.0
[3] SummarizedExperiment_1.30.2 Biobase_2.60.0
[5] MatrixGenerics_1.12.3 matrixStats_1.2.0
[7] GenomicRanges_1.52.0 GenomeInfoDb_1.36.4
[9] IRanges_2.34.1 S4Vectors_0.38.2
[11] BiocGenerics_0.46.0
loaded via a namespace (and not attached):
[1] DBI_1.2.2 bitops_1.0-7
[3] permute_0.9-7 biomaRt_2.56.1
[5] rlang_1.1.3 magrittr_2.0.3
[7] compiler_4.3.1 RSQLite_2.3.5
[9] GenomicFeatures_1.52.2 DelayedMatrixStats_1.22.6
[11] reshape2_1.4.4 png_0.1-8
[13] vctrs_0.6.5 stringr_1.5.1
[15] pkgconfig_2.0.3 crayon_1.5.2
[17] fastmap_1.1.1 dbplyr_2.3.4
[19] XVector_0.40.0 ellipsis_0.3.2
[21] utf8_1.2.4 Rsamtools_2.16.0
[23] promises_1.2.1 tzdb_0.4.0
[25] bit_4.0.5 zlibbioc_1.46.0
[27] cachem_1.0.8 progress_1.2.2
[29] blob_1.2.4 later_1.3.1
[31] rhdf5filters_1.12.1 DelayedArray_0.26.7
[33] Rhdf5lib_1.22.1 BiocParallel_1.34.2
[35] interactiveDisplayBase_1.38.0 prettyunits_1.2.0
[37] parallel_4.3.1 R6_2.5.1
[39] RColorBrewer_1.1-3 stringi_1.8.3
[41] limma_3.56.2 rtracklayer_1.60.1
[43] iterators_1.0.14 Rcpp_1.0.11
[45] R.utils_2.12.3 readr_2.1.4
[47] splines_4.3.1 httpuv_1.6.11
[49] Matrix_1.6-5 tidyselect_1.2.0
[51] abind_1.4-5 yaml_2.3.8
[53] codetools_0.2-19 curl_5.2.1
[55] doRNG_1.8.6 plyr_1.8.8
[57] regioneR_1.32.0 lattice_0.21-8
[59] tibble_3.2.1 shiny_1.7.5.1
[61] KEGGREST_1.40.1 BiocFileCache_2.8.0
[63] xml2_1.3.6 Biostrings_2.68.1
[65] pillar_1.9.0 BiocManager_1.30.22
[67] filelock_1.0.3 rngtools_1.5.2
[69] foreach_1.5.2 generics_0.1.3
[71] RCurl_1.98-1.14 ggplot2_3.4.3
[73] hms_1.1.3 BiocVersion_3.17.1
[75] sparseMatrixStats_1.12.2 munsell_0.5.0
[77] scales_1.3.0 bumphunter_1.42.0
[79] gtools_3.9.5 xtable_1.8-4
[81] glue_1.7.0 tools_4.3.1
[83] AnnotationHub_3.8.0 BiocIO_1.10.0
[85] data.table_1.15.2 BSgenome_1.68.0
[87] locfit_1.5-9.9 GenomicAlignments_1.36.0
[89] XML_3.99-0.16.1 rhdf5_2.44.0
[91] grid_4.3.1 AnnotationDbi_1.62.2
[93] colorspace_2.1-0 nlme_3.1-164
[95] GenomeInfoDbData_1.2.10 HDF5Array_1.28.1
[97] restfulr_0.0.15 annotatr_1.26.0
[99] cli_3.6.2 rappdirs_0.3.3
[101] fansi_1.0.6 S4Arrays_1.0.6
[103] dplyr_1.1.3 gtable_0.3.4
[105] outliers_0.15 R.methodsS3_1.8.2
[107] digest_0.6.35 rjson_0.2.21
[109] memoise_2.0.1 htmltools_0.5.7
[111] R.oo_1.26.0 lifecycle_1.0.4
[113] httr_1.4.7 mime_0.12
[115] bit64_4.0.5
If the function didn't finish running, you shouldn't expect any results. Did you kill the process? How did it not finish?