I recently discovered that the application of (at least) `norm2Filter()` is not consistent when replicated. I've pasted an example below. In the example dataset the differences are small--just a few events. In my much larger experimental datasets, however, the number of events changes by the hundreds and can significantly alter some of the downstream analysis.
## Loading example data dat <- read.FCS(system.file("extdata","0877408774.B08", package="flowCore")) n2f <- norm2Filter(filterId="myNorm2Filter", x=list("FSC-H", "SSC-H"), scale.factor=1) xyplot(`FSC-H`~`SSC-H`, data=dat, filter=n2f, smooth=FALSE, xbin=256, stats=TRUE) ## Same filter, inconsistent subsetting. sapply(1:15, function(x) { fres <- Subset(dat, n2f); return(nrow(fres)) })
I soon realized that if I `set.seed()` prior to the subset, the issue goes away, and the same number of events (and presumably the same ones) are returned each time.
sapply(1:15, function(x) { set.seed(1); fres <- Subset(dat, n2f); return(nrow(fres)) })
Is this because the `Subset()` command in combination with the `norm2Filter()` is using some kind of "training set" which is randomly selected? How can I modify the `norm2Filter()` and/or `Subset()` functions to use the WHOLE dataset so that my analysis is not sensitive to the RNG?