In a certain arcane context I wanted to look at the result of applying rbinom() to two versions of the probability argument, the versions differing in only a small number (three) of entries. I thought that if I called set.seed() (with the same argument) prior to each call to rbinom(), I would get results that differed only where the probabilities differed. Most of the time this appears to be true: set.seed(17) # Or any other seed, in my experience. p0 <- runif(147) p1 <- p0 p1[c(26,65,131)] <- 0.5 set.seed(42) r0 <- rbinom(147,1000,p0) set.seed(42) r1 <- rbinom(147,1000,p1) which(r1 != r0) [1] 26 65 131 But there is a particular probability vector for which the expected result does not hold. I have attached this vector in a file p.weird.txt. Access it via p.weird <- dget("p.weird.txt"). If I do: p0 <- p.weird p1 <- p0 p1[c(26,65,131)] <- 0.5 set.seed(42) r0 <- rbinom(147,1000,p0) set.seed(42) r1 <- rbinom(147,1000,p1) which(r1 != r0) then I get [1] 26 65 66 72 73 74 75 76 131 so the results differ (inexplicably, to me) at indices 66, 72, 73, 74, 75, and 76 as well as at the indices at which they would be expected to differ. There is "pattern" or "structure" in p.weird (try plot(p.weird) to see it) but why should that matter? I have just now discerned that if I (slightly) *round* the values of p.weird: p.round <- round(p.weird,12) then going through the rbinom() exercise with p.round gives the expected results, i.e. r0 and r1 differ only at indices 26, 65 and 131. Note that although p.round and p.weird are "slightly different":> > range(p.round-p.weird) > [1] -4.749534e-13 4.686251e-13they differ by less than sqrt(.Machine$double.eps) whence all.equal() says that they are "the same". I am completely unable to understand how there could be a mechanism by which the behaviour of rbinom() would be affected in this way. Can anyone provide me with enlightenment? Ta. cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: p.weird.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20220730/23c2c995/attachment.txt>
Rolf,> On Jul 29, 2022, at 4:36 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote: > > In a certain arcane context I wanted to look at the result of > applying rbinom() to two versions of the probability argument, > the versions differing in only a small number (three) of entries. > > I thought that if I called set.seed() (with the same argument) prior to > each call to rbinom(), I would get results that differed only where the > probabilities differed. Most of the time this appears to be true: > > set.seed(17) # Or any other seed, in my experience. > p0 <- runif(147) > p1 <- p0 > p1[c(26,65,131)] <- 0.5 > set.seed(42) > r0 <- rbinom(147,1000,p0) > set.seed(42) > r1 <- rbinom(147,1000,p1)[rest deleted] The fillip is that rbinom may use a variable number of draws from a uniform to generate the binomial. You can read the Kachitvichyanukul and Schmeiser article or hunt through the code to see what it does, but brute force exploration will give you a sense: set.seed(1) u1000 <- runif(1000) res <- sapply(u1000, function(pr){ set.seed(1) rbinom(1,1000,pr) u_next <- runif(1) j <- match(u_next, u1000) c( i, j-1, pr)} ) table(res[2,]) # how many draws range(res[3, res[2,] == 2]) # range when 2 draws plot(res[2,], res[3,]) Evidently, tails of `pr' only need one draw, but the interior needs two. So my guess is that the changes you made shifted the RNG stream. I guess you need to reset the seed for each trial. --- FWIW, I tried your p.weird example, but did not get the same results as you showed. Only elements 26, 65, and 131 differed. Note that for me:> RNGkind()[1] "Mersenne-Twister" "Inversion" "Rejection"> > sessionInfo()R version 4.2.0 (2022-04-22) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.4 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.2.0 tools_4.2.0>HTH, Chuck