I've been trying to use the RnBeadsAnnotationCreator (https://github.com/epigen/RnBeadsAnnotationCreator) to create an annotation for the rn6 build of the rat genome. I started with the createAnnotationPackage.rn5.R file and updated it for rn6 see: but i've been unable to get it to build the rn6 annotation.
When I call: createAnnotationPackage("rn6", dest = "~/Documents/myCustomRlibs/RnBeads.rn6")
The VCF files apprear to be sucessfully downloaded but contain this warning in the output:
2019-02-04 14:57:56 1.8 WARNING Could not detect MAF data (CAF field)
2019-02-04 14:57:56 1.8 WARNING Could not detect MAF data (G5 field); allele frequency is not considered
And then I get this error:
2019-02-04 14:59:09 2.3 ERROR dbSNP table(s) are too large. MapReduce implemented in ~/Documents/myCustomRlibs/RnBeads.rn6/RnBeads.rn6/temp/split.snps.R
Error in logger.error(paste("dbSNP table(s) are too large. MapReduce implemented in", :
dbSNP table(s) are too large. MapReduce implemented in ~/Documents/myCustomRlibs/RnBeads.rn6/RnBeads.rn6/temp/split.snps.R
I took a look at the source of the rnb.update.dbsnp()
function (https://rdrr.io/github/epigen/RnBeadsAnnotationCreator/src/R/snps.R), which appears to be the source of the error, but haven't been able to figure out why it's triggering an error.
Any suggestions would be much appreciated.
SessionInfo()
:
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.10 (Santiago)
Matrix products: default
BLAS: /home/local/software/R/3.4.2/lib64/R/lib/libRblas.so
LAPACK: /home/local/software/R/3.4.2/lib64/R/lib/libRlapack.so
locale:
[1] C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] BSgenome.Rnorvegicus.UCSC.rn6_1.4.1
[2] BSgenome_1.46.0
[3] rtracklayer_1.38.3
[4] RnBeadsAnnotationCreator_0.99.0
[5] RnBeads_1.10.8
[6] plyr_1.8.4
[7] methylumi_2.24.1
[8] minfi_1.24.0
[9] bumphunter_1.20.0
[10] locfit_1.5-9.1
[11] Biostrings_2.46.0
[12] XVector_0.18.0
[13] SummarizedExperiment_1.8.1
[14] DelayedArray_0.4.1
[15] FDb.InfiniumMethylation.hg19_2.2.0
[16] org.Hs.eg.db_3.5.0
[17] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[18] GenomicFeatures_1.30.3
[19] AnnotationDbi_1.40.0
[20] reshape2_1.4.3
[21] scales_0.5.0
[22] Biobase_2.38.0
[23] illuminaio_0.20.0
[24] matrixStats_0.53.1
[25] limma_3.34.9
[26] gridExtra_2.3
[27] gplots_3.0.1
[28] ggplot2_3.1.0
[29] fields_9.6
[30] maps_3.3.0
[31] spam_2.2-1
[32] dotCall64_1.0-0
[33] ff_2.2-14
[34] bit_1.1-14
[35] cluster_2.0.7-1
[36] MASS_7.3-50
[37] GenomicRanges_1.30.3
[38] GenomeInfoDb_1.14.0
[39] IRanges_2.12.0
[40] S4Vectors_0.16.0
[41] BiocGenerics_0.24.0
[42] doParallel_1.0.14
[43] iterators_1.0.10
[44] foreach_1.4.4
loaded via a namespace (and not attached):
[1] colorspace_1.3-2 siggenes_1.52.0 mclust_5.4
[4] base64_2.0 bit64_0.9-7 xml2_1.2.0
[7] codetools_0.2-15 splines_3.4.2 Rsamtools_1.30.0
[10] annotate_1.56.2 readr_1.3.1 compiler_3.4.2
[13] httr_1.3.1 assertthat_0.2.0 Matrix_1.2-14
[16] lazyeval_0.2.1 prettyunits_1.0.2 tools_3.4.2
[19] bindrcpp_0.2.2 gtable_0.2.0 glue_1.3.0
[22] GenomeInfoDbData_1.0.0 dplyr_0.7.5 doRNG_1.6.6
[25] Rcpp_1.0.0 multtest_2.34.0 gdata_2.18.0
[28] preprocessCore_1.40.0 nlme_3.1-137 stringr_1.3.1
[31] rngtools_1.3.1 gtools_3.5.0 XML_3.98-1.11
[34] beanplot_1.2 zlibbioc_1.24.0 hms_0.4.2
[37] GEOquery_2.46.15 RColorBrewer_1.1-2 memoise_1.1.0
[40] pkgmaker_0.27 biomaRt_2.34.2 reshape_0.8.7
[43] stringi_1.2.4 RSQLite_2.1.1 genefilter_1.60.0
[46] RMySQL_0.10.15 caTools_1.17.1 BiocParallel_1.12.0
[49] bibtex_0.4.2 rlang_0.3.1 pkgconfig_2.0.2
[52] bitops_1.0-6 nor1mix_1.2-3 lattice_0.20-35
[55] purrr_0.3.0 bindr_0.1.1 GenomicAlignments_1.14.2
[58] tidyselect_0.2.5 magrittr_1.5 R6_2.2.2
[61] DBI_1.0.0 pillar_1.3.1 withr_2.1.2
[64] survival_2.42-3 RCurl_1.95-4.10 tibble_2.0.1
[67] crayon_1.3.4 KernSmooth_2.23-15 progress_1.2.0
[70] data.table_1.11.4 blob_1.1.1 digest_0.6.15
[73] xtable_1.8-2 tidyr_0.8.2 openssl_1.0.1
[76] munsell_0.5.0 registry_0.5 quadprog_1.5-5
Hi Michael,
I'm not quite clear on what exactly the maxSNPs parameter does, would reducing it just change the 'batch' size for processing or will it mean some SNPs might not be available in the annotation?
I was logged into an HPC cluster using the PBS/Torque a job management setup when I tried running this - is there something I have to do to set it up to make use of cluster - there didn't seem to be any options related to setting up the jobs?
Thanks for the help, Richard
Hey Richard,
The maxSNPs parameter is used as the maximum batch size. So if you have more SNPs than this number, RnBeads suggests to use a compute cluster for that. This is the case you are in. RnBeads automatically creates a script that you can use for job scheduling with PBS. In your case, it is stored in:
It uses the function rnb.split.snps to create smaller batches of SNPs to be processed. If I were you, I would first try increasing the maxSNPs parameter and then go for the batch processing.