When doing multiple muscle alignments (in a loop) in R (v. 3.0.3) under OS X 10.9.5 64 bits, the memory usage of R keeps increasing until the system has to stop the App.
This is muscle 3.8.31-4
I'm doing 100,000+ alignments of pairs of sequences (length 1000 bp or less) in a loop. I don't save the alignments, I just output each alignment in an object that I process.
The command goes like this (simplified)
for (i in 1:100000) {
aln = muscle(sequences, quiet =T)
#do stuff with aln
}
The "sequences" object is just a pair of dna sequences. Note that the "aln" object should be purged at every iteration. So I don't see any reason why memory usage goes out of control. I have 32 GB or ram in my Mac. After approx 70000 steps, the system could not compress memory any longer and forced R to stop. R was using 80 gb+ of virtual memory.
And it's not the stuff I do with the aln object that causes the issue.
Thanks for your help.
Jean
Hi Jean,
All this is valuable feedback for Alex, the maintainer of the muscle package. Note that you seem to be using an old version of the package though. The package used to be on CRAN before it became a Bioconductor package. The current version of the package is 3.10.0 and is part of BioC 3.1, the current version of Bioconductor (requires R-3.2). Your version of muscle (3.8.31-4) indicates that you're using a version of muscle that you got from CRAN (it's probably archived by now). As Steve suggested, please always use the current version of Bioconductor and only report problems for this version (together with your
sessionInfo
).Furthermore, as Steve mentioned, the msa package (another Bioconductor package for multiple sequence alignments), also includes the MUSCLE algo, in addition to the ClustalW and ClustalOmega algos. You could try this out. I know that the authors of msa have put some special efforts in avoiding memory leaks in some of the algorithms. Would be interesting to know if you get better results with it.
Finally let me mention that, if you're aligning pairs of sequences, you could also try the
pairwiseAlignment()
function from the Biostrings package.pairwiseAlignment()
is vectorized on the pattern and subject so it can align your 100,000+ pairs in a single call. According to the timings I get on a modest laptop, this should not take more than 20 minutes and not use more than 1.5 Gb of RAM.Cheers,
H.