DECIPHER::AlignSeqs fails with a large guide tree
1
0
Entering edit mode
@brendanfurneaux-13009
Last seen 4.9 years ago

I'm trying to use DECIPHER::AlignSeqs to progressively align a large number of sequences (>10000) for which I have a guide tree. This fails in dendrapply, apparently because of excessive recursion. Here is an example which replicates the error:

options(expressions = 1e4)
n <- 1e4
labels <- paste0("t", seq.int(n))

# generate a random alignment
aln <- sample(Biostrings::DNA_BASES, 100 * n, replace = TRUE)
aln <- matrix(aln, nrow = n)
aln <- apply(aln, 1, paste, collapse = "")
names(aln) <- labels
aln <- Biostrings::DNAStringSet(aln)

# create a very deep tree
x <- exp(-seq.int(n)/100)
names(x) <- labels
dist <- dist(x)
tree <- hclust(dist, method = "single")
tree <- as.dendrogram(tree)

DECIPHER::AlignSeqs(
  myXStringSet = aln,
  guideTree = tree,
  iterations = 0,
  refinements = 0
)

My first attempt gave Error: C stack usage 7971876 is too close to the limit; I ran on a machine with more RAM and got Error: evaluation nested too deeply: infinite recursion / options(expressions=)?. After including options(expressions = 10000) as above, I get Error: node stack overflow. I'm not aware of any way to increase the size of the node stack.

I've tried sorting the tree to have the deepest branches listed first or last, in hopes that this might clear some of the node stack, but it doesn't seem to help.

Is there any way to do this within DECIPHER? AlignSeqs works on alignments this large, and internally generates a guide tree, so it seems that this is just an issue with preprocessing the externally provided tree.

DECIPHER • 1.1k views
ADD COMMENT
0
Entering edit mode
Erik Wright ▴ 150
@erik-wright-14386
Last seen 8 months ago
United States

Hi Brendan,

Thanks for catching this issue, and I appreciate the easily reproducible example.

Unbeknownst to me, dendrapply() is recursive rather than iterative. AlignSeqs() avoids recursion for this reason, but dendrapply() causes this error when a guideTree is provided. I have converted this use of dendrapply() to rapply(), which avoids recursion.

This fix will become part of the next release of DECIPHER (early May). In the meantime, please email me directly for the updated AlignSeqs().

Erik

ADD COMMENT

Login before adding your answer.

Traffic: 535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6