WARNING: remove set.seed usage in R code
1
0
Entering edit mode
xyluo1991 • 0
@xyluo1991-16247
Last seen 6.4 years ago

Dear All,

Bioccheck gave me a warning of " WARNING: Remove set.seed usage in R code" because I used set.seed(seed) in my R function to reproduce results, where "seed" is an argument of my R function and can be specified by the user. To bypass Bioccheck, I have to remove the set.seed(seed) in my R function, but I want to let the user have the option to select a seed in my R function. How should I resolve this issue?

Thank you very much for your help!

Best,

Xiangyu 

R bioccheck • 4.9k views
ADD COMMENT
1
Entering edit mode
shepherl 4.1k
@lshep
Last seen 2 days ago
United States

Generally we recommend the set.seed be done in the documentation and outside the function. Not only does this clearly display to the user that a seed is used, but an explanation of why the seed is used can also be provided to the user.


x <- function(){ some code}

 set.seed(123)

x()

You could keep the seed argument in your functions and clearly document. When you are submitting your package to the issue tracker, explain to the reviewer why a seed is set and your justification for keeping it in the function. It will be at your reviewers discretion if this will be allowed or not and they may insist on the former solution.

ADD COMMENT
3
Entering edit mode

In addition to Lori's answer, here's a little anecdote.

I often perform simulations with randomly generated data to test the performance of various algorithms. I usually generate some data, test the method and compute some measure of performance; and repeat this for several iterations to ensure that I get representative estimates of the metric of interest. At one point, I noticed that the standard deviation of my metrics was extremely low. Why? Because someone had put set.seed inside their function, which affects the entire R session after the function call - this meant that my "randomly" generated data was always the same after the second iteration!

In short, it's always easy for users to call set.seed if they want to. But putting the set.seed inside functions can quietly lead to surprising side-effects in downstream code involving randomness. Moreover, it's much harder to "uncall" set.seed. Hence the advice from BiocCheck to not put set.seed inside the function.

ADD REPLY
0
Entering edit mode

The proper way would be to test whether .Random.seed exists, save and restore it upon exit. In my packages (which live on CRAN, not Bioconductor) I also tend to allow the user to request that the random seed not be set within the function, by supplying NULL to the argument randomSeed below.

foo = function(..., randomSeed=1)
{
    if (!is.null(randomSeed)) {
        if (exists(".Random.seed")) {
            savedSeed = .Random.seed
            on.exit(.Random.seed <<-savedSeed)
        }
        set.seed(randomSeed)
    }
    actual code...
}
ADD REPLY
0
Entering edit mode

It seems by setting the random seed (I don't know the context, so could be off-base here) you're somehow overstating the reproducibility of foo() in the manner illustrated by Aaron's anecdote; it seems better to have NULL as the default?

Artificial, but

f = function() {
    .Random.seed <- 1
    function() {
        seed <- .Random.seed
        on.exit(.Random.seed <<- seed)
        rnorm(10)
    }
}

modifies the .Random.seed of the generator.

set.seed(123)
xx <- .Random.seed
res <- f()()
identical(xx, .Random.seed)  # FALSE

Maybe it's safer (since the user can manipulate the parent environment but not the location of .GlobalEnv in the search() path) with

f = function() {
    .Random.seed <- 1
    function() {
        seed <- get(".Random.seed", 1)
        on.exit(assign(".Random.seed", seed, 1))
        rnorm(10)
    }
}
ADD REPLY
0
Entering edit mode

You're right, I didn't think of the possibility of calling code defining its own .Random.seed (which is probably not very frequent but certainly possible).

ADD REPLY
0
Entering edit mode

Thank you very much for the great replies! I have removed the set.seed within my function and explicitly state it outside. I greatly appreciate these helpful comment!

ADD REPLY

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6