Hi, i am trying to discover a desired motif in a set of 251 sequences but my results are not consistent. In some runs i get desired motif but in other runs it disappears. Now i am trying to find the motif with some motif as a seed in my DNAstringset object of sequences.
My seeded motif is given here is present in motif.txt file
A 0.4619 0.927 0.8053 0.9305 0.4262 0.6623 0.4405 0.8018 0.7588 0.8912 0.7517 0.8268 0.4834 0.6158
C 0.0148 0.0077 0.0291 0.0148 0.3046 0.0112 0.022 0.0327 0.0148 0.022 0.0148 0.0184 0.0828 0.0291
G 0.1114 0.0148 0.0077 0.0291 0.0935 0.1221 0.0184 0.1078 0.0685 0.0148 0.1543 0.0291 0.1114 0.14
T 0.4119 0.0506 0.1579 0.0255 0.1758 0.2044 0.5192 0.0577 0.1579 0.072 0.0792 0.1257 0.3224 0.2151
Please tell me possible command to get similar motif to seeded motif in DNAstringset.
Hi Charles,
The
GADEM()
function has aseed
argument, and, according to its man page, "when a seed is specified, the run results are deterministic". This is a good feature that all randomized algorithms in Bioconductor are expected to have in order to allow reproducible research. Are you sure the non-deterministic behavior observed by the OP is not a bug?Thanks,
H.
Hi Hervé,
There are 2 types of seeds with the `GADEM()` function: the `seed` argument you mention that make the results deterministic and the `Spwm` param that let the user use a motif as a starting point for the genetic algorithm (the other option is to let the `GADEM()` function generate the starting motifs with the most frequent k-mers in the sequences). Based on the ininital question, I assumed Vinod was talking was talking about the `Spwm` param. If it's not the case, then it's clearly a bug as you said.
Yes Vinod is saying that he's using a seeded motif (and is showing the motif). Are you saying that when the user gives the algorithm a seeded motif then the algorithm is not deterministic anymore? Just to clarify, deterministic means that 2 runs with exactly the same input (in particular same
seed
and sameSpwm
args) will produce the same output.H.
Hi Hervé,
What I meant is that if the user *only* provide a seeded motif through the `Spwm` arg, then it's not deterministic (a seeded motif but no seed). The `seed` arg should determine if the algorithm will be deterministic independently of the values of any other args.
In the case of the OP, I assumed the `Spwm` arg was used without the `seed` arg since the results were different after each run. But I could be wrong, and in that case it would be a bug like you mention in your first comment.
The
seed
argument has a default value of 1 which means that if the user doesn't supply it it will be set to 1. Are you saying that whenseed=1
the algorithm is not determinitstic? Is the value 1 treated in a special way? When I look at the implementation of theGADEM()
function, it doesn't seem so: what I see is that only ifseed
is set to NULL is the call toset.seed(seed)
skipped. So it looks like the algorithm is not deterministic only when the user suppliesseed=NULL
. As a consequence, if the user *only* provides a seeded motif through theSpwm
arg (i.e. a seeded motif but no seed) then the algorithm should be deterministic. Am I missing something?H.