Hello,
I tried to use the latest version of wavClusteR(2.1.0) to analyze an illumina sequencing data set.
However, I have a problem with using this package.
It worked well with example file and some files(small size of files).
I followed the R script exactly.
When I run some large files, I got an error at 'getAllSub(Bam, minCov=10)' like this:
> library(wavClusteR)
> filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
> BAM <- readSortedBam(filename = filename)
> BAM
GRanges object with 1468971 ranges and 2 metadata columns:
seqnames ranges strand | qseq MD
<Rle> <IRanges> <Rle> | <DNAStringSet> <character>
[1] chr1 [22559, 22572] - | GGTCCAGCACGAGG 14
[2] chr1 [25049, 25069] - | AGGCCAGGAGCGCATCGACGG 15A2T2
[3] chr1 [25049, 25071] - | AGGCCAGGAGCGCATCGCCGGGA 15A1A0T4
[4] chr1 [25049, 25063] - | AGGCCAGGAGCGCAT 15
[5] chr1 [25049, 25063] - | AGGCCAGGAGCGCAT 15
... ... ... ... ... ... ...
[1468967] chrM [16384, 16414] - | CAGATAGGGGTCCCTTGACCACCATCCTCCC 30G0
[1468968] chrM [16433, 16461] + | CAGGAGTGCTACTCTCCTCGCTCCGGGCC 2A26
[1468969] chrM [16443, 16471] - | ACTCTCCTCGCTCCGGGCCCATAACGCGC 25A1T0T0
[1468970] chrM [16443, 16471] - | ACTCTCCTCGCTCCGGGCCCATAACACTT 29
[1468971] chrM [16444, 16465] - | CTCTCCTCGCTCCGGGCCTGTA 18C0A2
-------
seqinfo: 25 sequences from an unspecified genome; no seqlengths
> countTable <- getAllSub( BAM, minCov = 10 )
Loading required package: doMC
Considering substitutions, n = 117586, processing in 118 chunks
chunk #: 1
chunk #: 2
.
.
.
.
chunk #: 117
chunk #: 118
GRanges object with 1728352 ranges and 2 metadata columns:
seqnames ranges strand | substitutions coverage
<Rle> <IRanges> <Rle> | <character> <numeric>
[1] chrM [2230, 2230] + | AC Inf
[2] chrM [2231, 2231] + | AG Inf
[3] chrM [2232, 2232] + | AT Inf
[4] chrM [2355, 2355] + | AG Inf
[5] chrM [2356, 2356] + | AC Inf
... ... ... ... ... ... ...
[1728348] chr1 [49479464, 49479464] + | TG Inf
[1728349] chr1 [49479468, 49479468] + | TG Inf
[1728350] chrM [ 13340, 13340] - | AG Inf
[1728351] chrM [ 13350, 13350] - | AG Inf
[1728352] chrM [ 13356, 13356] - | AG Inf
-------
seqinfo: 25 sequences from an unspecified genome; no seqlengths
considering the + strand
Computing local coverage at substitutions...
Error in NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
subscript contains NAs or out-of-bounds indices
Here is the output of seesionInfo.
>sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] doSNOW_1.0.12 snow_0.3-13 iterators_1.0.7 foreach_1.4.2 wavClusteR_2.1.0 Rsamtools_1.18.2 Biostrings_2.34.1
[8] XVector_0.6.0 GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 IRanges_2.0.1 S4Vectors_0.4.0 BiocGenerics_0.12.1
loaded via a namespace (and not attached):
[1] acepack_1.3-3.3 ade4_1.6-2 AnnotationDbi_1.28.1 base64enc_0.1-2 BatchJobs_1.5 BBmisc_1.9
[7] Biobase_2.26.0 BiocParallel_1.0.3 biomaRt_2.22.0 bitops_1.0-6 brew_1.0-6 checkmate_1.5.1
[13] cluster_2.0.1 codetools_0.2-10 colorspace_1.2-4 compiler_3.1.2 DBI_0.3.1 digest_0.6.8
[19] fail_1.2 foreign_0.8-62 Formula_1.2-0 GenomicAlignments_1.2.1 GenomicFeatures_1.18.3 ggplot2_1.0.0
[25] grid_3.1.2 gtable_0.1.2 Hmisc_3.14-6 ifultools_2.0-1 lattice_0.20-29 latticeExtra_0.6-26
[31] MASS_7.3-37 mclust_4.4 munsell_0.4.2 nnet_7.3-8 plyr_1.8.1 proto_0.3-10
[37] RColorBrewer_1.1-2 Rcpp_0.11.4 RCurl_1.95-4.5 reshape2_1.4.1 rpart_4.1-8 RSQLite_1.0.0
[43] rtracklayer_1.26.2 scales_0.2.4 sendmailR_1.2-1 seqinr_3.1-3 splines_3.1.2 splus2R_1.2-0
[49] stringr_0.6.2 survival_2.37-7 tools_3.1.2 wmtsa_2.0-0 XML_3.98-1.1 zlibbioc_1.12.0
I tried with wavClusteR(version 2.0.0) too but I had same error.
Please give me an advice to figure it out.
Thank you.
- Hee -