I am wondering whether there is a bug in rnorm. When generating rnorm(1000000) and counting the cases > 4 and the cases < (-4) I get rather unexpectedly low counts for the latter. The problem goes away when using qnorm(runif(1000000)). Fritz Scholz, PhD Applied Statistics Group Boeing Phantom Works fritz.scholz at pss.boeing.com 425-865-3623 Tu/We 206-542-6545 (most likely)
I don't see anything suspicious (R 2.0.1 on Windows XP Pro, Pentium M):> set.seed(1) > res <- replicate(50, {x <- rnorm(1e6); c(sum(x < -4), sum(x > 4))}) > apply(res, 1, summary)[,1] [,2] Min. 20.0 21.00 1st Qu. 27.0 30.00 Median 30.0 33.50 Mean 30.7 33.74 3rd Qu. 34.0 37.00 Max. 42.0 48.00> pnorm(-4) * 1e6[1] 31.67124> res2 <- replicate(50, {x <- qnorm(runif(1e6)); c(sum(x < -4), sum(x >4))})> apply(res2, 1, summary)[,1] [,2] Min. 18.00 19.00 1st Qu. 30.00 27.25 Median 32.00 31.00 Mean 32.60 31.38 3rd Qu. 35.75 35.00 Max. 47.00 45.00> RNGkind()[1] "Mersenne-Twister" "Inversion" Andy> From: Scholz, Fritz > > I am wondering whether there is a bug in rnorm. > When generating rnorm(1000000) and counting > the cases > 4 and the cases < (-4) I get rather > unexpectedly low counts for the latter. The problem goes away > when using qnorm(runif(1000000)). > > Fritz Scholz, PhD > Applied Statistics Group > Boeing Phantom Works > fritz.scholz at pss.boeing.com > 425-865-3623 > Tu/We 206-542-6545 (most likely) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
1) Why do you think there is a bug in rnorm. What is your expected numbers for cases > 4 or cases < -4 ? If you can show your calculations then maybe we can see if there is an error in how you calculated it or with R. 2) Please give a reproducible example. It might be low or high in one instance on a single run but if you do it several times you may not see a consistent bias. If you do, then there could be a bug. Otherwise it could be an element of randomness. B <- 10 # increase this if you can out1 <- matrix( nr=B, nc=2 ) out2 <- matrix( nr=B, nc=2 ) thr <- 4 set.seed(1) for(i in 1:B){ x <- rnorm(1000000) out1[ i, 1 ] <- mean( x > thr ) # empirical upper tail prob out1[ i, 2 ] <- mean( x < -1*thr ) # empirical lower tail prob y <- qnorm(runif(1000000)) out2[ i, 1 ] <- mean( y > thr ) out2[ i, 2 ] <- mean( y < -1*thr ) rm(x, y); print(i) }> out1[,1] [,2] [1,] 3.6e-05 3.7e-05 [2,] 3.2e-05 2.6e-05 [3,] 3.2e-05 2.9e-05 [4,] 3.7e-05 2.8e-05 [5,] 3.5e-05 2.5e-05 [6,] 4.3e-05 3.3e-05 [7,] 3.9e-05 3.0e-05 [8,] 3.2e-05 4.8e-05 [9,] 3.0e-05 3.1e-05 [10,] 3.4e-05 3.7e-05> out2[,1] [,2] [1,] 3.5e-05 2.4e-05 [2,] 2.4e-05 3.5e-05 [3,] 4.1e-05 2.6e-05 [4,] 2.5e-05 3.9e-05 [5,] 4.1e-05 2.5e-05 [6,] 4.0e-05 3.1e-05 [7,] 2.9e-05 2.7e-05 [8,] 3.1e-05 3.3e-05 [9,] 3.5e-05 2.5e-05 [10,] 2.7e-05 3.4e-05 The probability of observing any value above 4 or below -4 is pnorm(4, lower.tail=FALSE) = 3.167124e-05. There does not seem to be anything suspicious from the run of 10. 3) I think this question might be more appropriate to post this question to R-devel. Please use an informative subject line. Please read the posting guide. Thanks. Regards, Adai On Mon, 2005-02-21 at 15:45 -0800, Scholz, Fritz wrote:> I am wondering whether there is a bug in rnorm. > When generating rnorm(1000000) and counting > the cases > 4 and the cases < (-4) I get rather > unexpectedly low counts for the latter. The problem goes away > when using qnorm(runif(1000000)). > > Fritz Scholz, PhD > Applied Statistics Group > Boeing Phantom Works > fritz.scholz at pss.boeing.com > 425-865-3623 > Tu/We 206-542-6545 (most likely) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
On Mon, 21 Feb 2005, Scholz, Fritz wrote:> I am wondering whether there is a bug in rnorm.I am wondering if there is a bug in your procedures, or if you used a very old version of R. If you call `bug', do have the courtesy to present your reasons. See the posting guide for other missing information, like the R version, and in this case> RNGkind()[1] "Mersenne-Twister" "Inversion" which shows the default.> When generating rnorm(1000000) and counting > the cases > 4 and the cases < (-4) I get rather > unexpectedly low counts for the latter. The problem goes away > when using qnorm(runif(1000000)).That is the algorithm used by default internally (since 1.7.0, mid-2003, it seems).> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 3.5e-05 [1] 2.8e-05> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 3.5e-05 [1] 3.5e-05> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 2.7e-05 [1] 2.9e-05> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 4.3e-05 [1] 3.7e-05 looks about right. As does cnt <- 0 for(i in 1:100) {x <- rnorm(1000000); cnt <- cnt + sum(x < -4)} cnt [1] 3157 which is (to a good approximation) Poisson(3167)> ppois(3157, 1e8*pnorm(-4))[1] 0.4332376 or more exactly> pbinom(3157, 1e8, pnorm(-4))[1] 0.4332365 However, this is not a good way to generate such random events, and has many times tripped up people: see the cover of my 1987 simulation book for the theory behind one famous (and published) instance. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I was not able to reproduce the behavior reported in my message below. I was trying it on the same laptop in the same workspace using again R 2.0.0, but the low counts of 4 to 7 (witnesses by some of my colleagues) did not show up again. Please do not spend any more effort on this.>I am wondering whether there is a bug in rnorm. >When generating rnorm(1000000) and counting >the cases > 4 and the cases < (-4) I get rather >unexpectedly low counts for the latter. The problem goes away >when using qnorm(runif(1000000)).Fritz Scholz, PhD Applied Statistics Group Boeing Phantom Works fritz.scholz at pss.boeing.com 425-865-3623 Tu/We 206-542-6545 (most likely)