GILLIBERT, Andre
2021-Aug-12 09:51 UTC
[Rd] Problem in random number generation for Marsaglia-Multicarry + Kinderman-Ramage
Dear R developers, In my opinion, I discovered a severe flaw that occur with the combination of the Marsaglia-Multicarry pseudo-random number generator associated to the Kinderman-Ramage algorithm to generate normally distributed numbers. The sample program is very simple (tested on R-4.1.1 x86_64 on Windows 10): set.seed(1, "Marsaglia-Multicarry", normal.kind="Kinderman-Ramage") v=rnorm(1e7) poisson.test(sum(v < (-4)))$conf.int # returns c(34.5, 62.5) poisson.test(sum(v > (4)))$conf.int # returns c(334.2, 410.7) pnorm(-4)*1e7 # returns 316.7 There should be approximatively 316 values less than -4 and 316 values greater than +4, bug there are far too few values less than -4. Results are similar with other random seeds, and things are even more obvious with larger sample sizes. The Kinderman-Ramage algorithm is fine when combined to Mersenne-Twister, and Marsaglia-Multicarry is fine when combined with the normal.kind="Inversion" algorithm, but the combination of Marsaglia-Multicarry and Kinderman-Ramage seems to have severe flaws. R should at least warn for that combination ! What do you think? Should I file a bug report? -- Sincerely Andr? GILLIBERT [[alternative HTML version deleted]]
peter dalgaard
2021-Aug-12 13:07 UTC
[Rd] Problem in random number generation for Marsaglia-Multicarry + Kinderman-Ramage
With these matters, one has to be careful to distinguish between method error
and implementation error.
The reason for changing the RNG setup in R v. 1.7.0 was pretty much this kind of
unfortunate interaction between M-M and K-R. There are even more egregious
examples for the distribution of maxima of normal variables. Try e.g.
RNGversion("1.6.0") # Marsaglia-Multicarry, Kinderman-Ramage
s <- replicate(1e6,max(rnorm(10)))
plot(density(s))
(A further bug in K-R was fixed in 1.7.1, but that is tangential to this.)
A glimpse of the source of the problem is seen in the
"microcorrelations" in this:
RNGkind("Mar");m <- matrix(runif(4e7),2)
plot(m[1,],m[2,],xlim=c(0,1e-3),pch=".")
m <- matrix(runif(4e7),2)
points(m[1,],m[2,],pch=".")
These examples are from 2003, so the issue has been known for almost 2 decades.
However, to the best of our knowledge, the M-M RNG is a faithful implementation
of their method, so we have left the RNG in R's arsenal, in case someone
needed it for some specific purpose.
- pd
> On 12 Aug 2021, at 11:51 , GILLIBERT, Andre <Andre.Gillibert at
chu-rouen.fr> wrote:
>
> Dear R developers,
>
>
> In my opinion, I discovered a severe flaw that occur with the combination
of the Marsaglia-Multicarry pseudo-random number generator associated to the
Kinderman-Ramage algorithm to generate normally distributed numbers.
>
>
> The sample program is very simple (tested on R-4.1.1 x86_64 on Windows 10):
>
> set.seed(1, "Marsaglia-Multicarry",
normal.kind="Kinderman-Ramage")
> v=rnorm(1e7)
> poisson.test(sum(v < (-4)))$conf.int # returns c(34.5, 62.5)
> poisson.test(sum(v > (4)))$conf.int # returns c(334.2, 410.7)
> pnorm(-4)*1e7 # returns 316.7
>
>
> There should be approximatively 316 values less than -4 and 316 values
greater than +4, bug there are far too few values less than -4.
>
> Results are similar with other random seeds, and things are even more
obvious with larger sample sizes.
>
> The Kinderman-Ramage algorithm is fine when combined to Mersenne-Twister,
and Marsaglia-Multicarry is fine when combined with the
normal.kind="Inversion" algorithm, but the combination of
Marsaglia-Multicarry and Kinderman-Ramage seems to have severe flaws.
>
> R should at least warn for that combination !
>
> What do you think? Should I file a bug report?
>
> --
> Sincerely
> Andr? GILLIBERT
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com