Yes, I would think this behavior is intentionally, but obviously, I
don't know for sure. Looking at the code:
> parallel::clusterSetRNGStream
function (cl = NULL, iseed = NULL)
{
cl <- defaultCluster(cl)
oldseed <- if (exists(".Random.seed", envir = .GlobalEnv,
inherits = FALSE))
get(".Random.seed", envir = .GlobalEnv, inherits = FALSE)
else NULL
RNGkind("L'Ecuyer-CMRG")
if (!is.null(iseed))
set.seed(iseed)
nc <- length(cl)
seeds <- vector("list", nc)
seeds[[1L]] <- .Random.seed
You'll find that:
1. the stream of RNG seeds, originates from .Random.seed.
2a. 'iseed' is only applied if non-NULL, which changes starting
.Random.seed.
2b. If iseed = NULL, then the .Random.seed is whatever it was when you
called the function
If you use iseed = NULL, then you need to forward the RNG state
(=.Random.seed) yourself. Here's an example:
set.seed(1)
library(parallel)
cl <- parallel::makeCluster(5)
str(.Random.seed)
# int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1] 7 4 2 10 10
str(.Random.seed)
# int [1:626] 10403 624 -169270483 -442010614 -603558397 -222347416 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1] 7 4 2 10 10
## Forward RNG state
sample.int(1)
# [1] 1
str(.Random.seed)
# int [1:626] 10403 1 1654269195 -1877109783 -961256264 1403523942 ...
clusterSetRNGStream(cl, iseed = NULL)
parSapply(cl, 1:5, function(i) sample(1:10, 1))
# [1] 8 6 1 7 5
FYI, you see a similar behavior with parallel::mclapply():
set.seed(1)
library(parallel)
RNGkind("L'Ecuyer-CMRG")
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -1.2673735 0.9045952 1.9502072
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -1.2673735 0.9045952 1.9502072
## Forward RNG state
sample.int(1)
# [1] 1
unlist(parallel::mclapply(1:2, function(n) rnorm(n), mc.set.seed = TRUE))
# [1] -0.09117479 -1.07803714 0.13924063
I can see pros and cons with this behavior, but I think the default is
risky. For instance, it's not hard to imagine an implementation
resampling algorithm where you have to option to run it via lapply()
or via parallel::mclapply() - there is a non-zero probability that
such an implementation produces identical samples.
Proper parallel RNG can be tricky
/Henrik
On Fri, Jun 7, 2019 at 7:09 AM Colin Gillespie <csgillespie at gmail.com>
wrote:>
> Dear All,
>
> Is the following expected behaviour?
>
> set.seed(1)
> library(parallel)
> cl = makeCluster(5)
> clusterSetRNGStream(cl, iseed = NULL)
> parSapply(cl, 1:5, function(i) sample(1:10, 1))
> # 7 4 2 10 10
> clusterSetRNGStream(cl, iseed = NULL)
> # 7 4 2 10 10
> parSapply(cl, 1:5, function(i) sample(1:10, 1))
> stopCluster(cl)
>
> The documentation could be read either way, e.g.
>
> * iseed: An integer to be supplied to set.seed, or NULL not to set
> reproducible seeds.
>
> From Details
>
> .... optionally setting the seed of the streams by set.seed(iseed)
> (otherwise they are set from the current seed of the master process:
> after selecting the L'Ecuyer generator).
>
> As may be guessed, this caught me out, since I was expecting the same
> behaviour as set.seed(NULL).
>
> Thanks
>
> Colin
>
> ----------
>
> R version 3.6.0 (2019-04-26)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 18.04.2 LTS
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel