Entering edit mode
Ross Patterson
▴
20
@ross-patterson-3886
Last seen 10.2 years ago
While performing some copy number analysis on data segmented with the
DNAcopy package, I have noticed some variations in the output data,
and was
hoping someone here could help shed some light on that. Specifically,
while
running the DNAcopy segmentation on the exact same input data multiple
times, I have noticed that the resultant segment data output sometimes
contains "extra" segments, caused by the discovery of "extra"
breakpoints.
In fact, the resultant output data is always different. Digging into
the
source code a little bit, I saw what appeared to be calls to some
random
number generating functions, although not being very familiar with
Fortran
code I could not tell how or why these numbers were being used, or
even if
that is the source of segmentation discrepancies. I know that in the
last
few years there have been some changes to the segmentation algorithm
to
allow it to run in near linear time. Did that require introducing
non-deterministic behavior? Is there a way to force the segmentation
algorithm to run deterministically, such that the output data can be
identically reproduced every time the segmentation is run?
Thank you in advance for your help,
Ross Patterson
[[alternative HTML version deleted]]