Carl Witthoft
2011-Feb-02 23:01 UTC
[R] testing randomness of random number generators with student t-test?
Hi, subject more or less says it all. I freely admit to not having bothered to find some of the online papers about method of testing the quality of random number generators -- but in an idle moment I wondered what to expect from something like the following: randa<-runif(1000) randb<-runif(1000) t.test(randa,randb)$p.value var.test(randa,randb)$p.value [repeat ad nauseum] Is the range of p-values I get in any way related tothe "quality" of the random number generator? thanks Carl
Barry Rowlingson
2011-Feb-02 23:45 UTC
[R] testing randomness of random number generators with student t-test?
On Wed, Feb 2, 2011 at 11:01 PM, Carl Witthoft <carl at witthoft.com> wrote:> Hi, subject more or less says it all. > > I freely admit to not having bothered to find some of the online papers > about method of testing the quality of random number generators -- but in an > idle moment I wondered what to expect from something like the following: > > > randa<-runif(1000) > randb<-runif(1000) > t.test(randa,randb)$p.value > var.test(randa,randb)$p.value > > [repeat ad nauseum] > > > Is the range of p-values I get in any way related tothe "quality" of the > random number generator?Well yes. All pseudo random number generators have a period, after which they come back to the start and begin churning out the same sequence again. Good PRNGs have a sequence length that is astronomically high. If you have a PRNG that has a sequence of 1000, or 500, or 200 etc your two sets will be perfectly correlated... You might want to read up on RANDU, the infamous poor PRNG: http://en.wikipedia.org/wiki/RANDU ?We guarantee that each number is random individually, but we don?t guarantee that more than one of them is random.? The other things to look at are the DieHard tests: http://en.wikipedia.org/wiki/Diehard_test Barry
Phil Spector
2011-Feb-03 00:18 UTC
[R] testing randomness of random number generators with student t-test?
Carl - Under the null hypothesis, the distribution of p-values for any statistical test should be uniform over the range from 0 to 1. So while the individual p-values you see in an experiment like the one you carried out aren't really meaningful, their ensemble behaviour is. So if you did something like> pvals = replicate(10000,{randa<-runif(1000);randb<-runif(1000);t.test(randa,randb)$p.value})> ks.test(pvals,'punif')you'd expect the ks.test to support the hypothesis that the pvals follow a U(0,1) distribution. As others have pointed out, there are many other issues regarding random number generation, but I think what I've described addresses the issue of the t.test probabilities. - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley spector at stat.berkeley.edu On Wed, 2 Feb 2011, Carl Witthoft wrote:> Hi, subject more or less says it all. > > I freely admit to not having bothered to find some of the online papers about > method of testing the quality of random number generators -- but in an idle > moment I wondered what to expect from something like the following: > > > randa<-runif(1000) > randb<-runif(1000) > t.test(randa,randb)$p.value > var.test(randa,randb)$p.value > > [repeat ad nauseum] > > > Is the range of p-values I get in any way related tothe "quality" of the > random number generator? > > thanks > Carl > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dirk Eddelbuettel
2011-Feb-03 01:08 UTC
[R] testing randomness of random number generators with student t-test?
On 2 February 2011 at 23:45, Barry Rowlingson wrote: | On Wed, Feb 2, 2011 at 11:01 PM, Carl Witthoft <carl at witthoft.com> wrote: | > Hi, subject more or less says it all. | > | > I freely admit to not having bothered to find some of the online papers | > about method of testing the quality of random number generators -- but in an | > idle moment I wondered what to expect from something like the following: | > | > | > randa<-runif(1000) | > randb<-runif(1000) | > t.test(randa,randb)$p.value | > var.test(randa,randb)$p.value | > | > [repeat ad nauseum] | > | > | > Is the range of p-values I get in any way related tothe "quality" of the | > random number generator? | | Well yes. All pseudo random number generators have a period, after | which they come back to the start and begin churning out the same | sequence again. Good PRNGs have a sequence length that is | astronomically high. If you have a PRNG that has a sequence of 1000, | or 500, or 200 etc your two sets will be perfectly correlated... | | You might want to read up on RANDU, the infamous poor PRNG: | | http://en.wikipedia.org/wiki/RANDU | | ?We guarantee that each number is random individually, but we don?t | guarantee that more than one of them is random.? | | The other things to look at are the DieHard tests: | http://en.wikipedia.org/wiki/Diehard_test And/or the DieHarder test by Robert G Brown et al -- and with that the RDieHarder package on CRAN which wraps. (And I need to catch up to the fresh development in DieHarder Dirk | Barry | | ______________________________________________ | R-help at r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-help | PLEASE do read the posting guide http://www.R-project.org/posting-guide.html | and provide commented, minimal, self-contained, reproducible code. -- Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com
Petr Savicky
2011-Feb-03 12:29 UTC
[R] testing randomness of random number generators with student t-test?
On Wed, Feb 02, 2011 at 06:01:36PM -0500, Carl Witthoft wrote:> Hi, subject more or less says it all. > > I freely admit to not having bothered to find some of the online papers > about method of testing the quality of random number generators -- but > in an idle moment I wondered what to expect from something like the > following: > > > randa<-runif(1000) > randb<-runif(1000) > t.test(randa,randb)$p.value > var.test(randa,randb)$p.value > > [repeat ad nauseum] > > > Is the range of p-values I get in any way related tothe "quality" of the > random number generator?Hi. As already explained, the result of t.test() in this case confirms good quality of Mersenne Twister generator used in R. The situation is slightly more complicated with ks.test() due to the 32-bit precision of the random numbers as discussed in section Note of ?RNGkind. For example n <- 100000 ks.test(runif(n), runif(n)) typically produces a warning due to ties. This is not related to the quality of the randomness. The reason is that the random numbers have 32 bits and due to birthday paradox we get collisions already for 2^16 numbers with probability about 0.39. The null hypothesis should be changed to assume uniform distribution on the numbers in (0, 1) with at most 32 bits. See section Random Number Generators of CRAN Task View Probability Distributions by Christophe Dutang for information on CRAN packages related to random numbers. As far as i know, the only tests, which can distinguish Mersenne Twister numbers from truly random ones are linear complexity tests mod 2. This is discussed, for example, in section 7 Conclusion, Future Work, and Open Issues in http://www.iro.umontreal.ca/~lecuyer/myftp/papers/horms.pdf by P. L'Ecuyer. Applications, which do not use the bitwise mod 2 (XOR) operations, are very unlikely to interfere with the linear tests mod 2. On the other hand, if bitwise XOR is used, then Mersenne Twister numbers may be predicted due to the fact that it is defined using XOR operation and the history of the last 624 numbers. A simple demonstration of this known predictability is contained in http://www.cs.cas.cz/~savicky/predict_MT/predict_MT.R At the first glance, this may look as very bad. On the other hand, if there is a relatively simple smooth function of 625 real variables, which has a measurable difference of expected value on Mersenne Twister numbers and truly random ones, then this is likely to be an interesting mathematical discovery. Petr Savicky.