I am wondering whether there is a bug in rnorm. When generating rnorm(1000000) and counting the cases > 4 and the cases < (-4) I get rather unexpectedly low counts for the latter. The problem goes away when using qnorm(runif(1000000)). Fritz Scholz, PhD Applied Statistics Group Boeing Phantom Works fritz.scholz at pss.boeing.com 425-865-3623 Tu/We 206-542-6545 (most likely)
I don't see anything suspicious (R 2.0.1 on Windows XP Pro, Pentium M):> set.seed(1) > res <- replicate(50, {x <- rnorm(1e6); c(sum(x < -4), sum(x > 4))}) > apply(res, 1, summary)[,1] [,2] Min. 20.0 21.00 1st Qu. 27.0 30.00 Median 30.0 33.50 Mean 30.7 33.74 3rd Qu. 34.0 37.00 Max. 42.0 48.00> pnorm(-4) * 1e6[1] 31.67124> res2 <- replicate(50, {x <- qnorm(runif(1e6)); c(sum(x < -4), sum(x >4))})> apply(res2, 1, summary)[,1] [,2] Min. 18.00 19.00 1st Qu. 30.00 27.25 Median 32.00 31.00 Mean 32.60 31.38 3rd Qu. 35.75 35.00 Max. 47.00 45.00> RNGkind()[1] "Mersenne-Twister" "Inversion" Andy> From: Scholz, Fritz > > I am wondering whether there is a bug in rnorm. > When generating rnorm(1000000) and counting > the cases > 4 and the cases < (-4) I get rather > unexpectedly low counts for the latter. The problem goes away > when using qnorm(runif(1000000)). > > Fritz Scholz, PhD > Applied Statistics Group > Boeing Phantom Works > fritz.scholz at pss.boeing.com > 425-865-3623 > Tu/We 206-542-6545 (most likely) > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
1) Why do you think there is a bug in rnorm. What is your expected
numbers for cases > 4 or cases < -4 ? If you can show your calculations
then maybe we can see if there is an error in how you calculated it or
with R.
2) Please give a reproducible example. It might be low or high in one
instance on a single run but if you do it several times you may not see
a consistent bias. If you do, then there could be a bug. Otherwise it
could be an element of randomness.
B <- 10 # increase this if you can
out1 <- matrix( nr=B, nc=2 )
out2 <- matrix( nr=B, nc=2 )
thr <- 4
set.seed(1)
for(i in 1:B){
x <- rnorm(1000000)
out1[ i, 1 ] <- mean( x > thr ) # empirical upper tail prob
out1[ i, 2 ] <- mean( x < -1*thr ) # empirical lower tail prob
y <- qnorm(runif(1000000))
out2[ i, 1 ] <- mean( y > thr )
out2[ i, 2 ] <- mean( y < -1*thr )
rm(x, y); print(i)
}
> out1
[,1] [,2]
[1,] 3.6e-05 3.7e-05
[2,] 3.2e-05 2.6e-05
[3,] 3.2e-05 2.9e-05
[4,] 3.7e-05 2.8e-05
[5,] 3.5e-05 2.5e-05
[6,] 4.3e-05 3.3e-05
[7,] 3.9e-05 3.0e-05
[8,] 3.2e-05 4.8e-05
[9,] 3.0e-05 3.1e-05
[10,] 3.4e-05 3.7e-05
> out2
[,1] [,2]
[1,] 3.5e-05 2.4e-05
[2,] 2.4e-05 3.5e-05
[3,] 4.1e-05 2.6e-05
[4,] 2.5e-05 3.9e-05
[5,] 4.1e-05 2.5e-05
[6,] 4.0e-05 3.1e-05
[7,] 2.9e-05 2.7e-05
[8,] 3.1e-05 3.3e-05
[9,] 3.5e-05 2.5e-05
[10,] 2.7e-05 3.4e-05
The probability of observing any value above 4 or below -4 is pnorm(4,
lower.tail=FALSE) = 3.167124e-05. There does not seem to be anything
suspicious from the run of 10.
3) I think this question might be more appropriate to post this question
to R-devel. Please use an informative subject line. Please read the
posting guide. Thanks.
Regards, Adai
On Mon, 2005-02-21 at 15:45 -0800, Scholz, Fritz wrote:> I am wondering whether there is a bug in rnorm.
> When generating rnorm(1000000) and counting
> the cases > 4 and the cases < (-4) I get rather
> unexpectedly low counts for the latter. The problem goes away
> when using qnorm(runif(1000000)).
>
> Fritz Scholz, PhD
> Applied Statistics Group
> Boeing Phantom Works
> fritz.scholz at pss.boeing.com
> 425-865-3623
> Tu/We 206-542-6545 (most likely)
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>
On Mon, 21 Feb 2005, Scholz, Fritz wrote:> I am wondering whether there is a bug in rnorm.I am wondering if there is a bug in your procedures, or if you used a very old version of R. If you call `bug', do have the courtesy to present your reasons. See the posting guide for other missing information, like the R version, and in this case> RNGkind()[1] "Mersenne-Twister" "Inversion" which shows the default.> When generating rnorm(1000000) and counting > the cases > 4 and the cases < (-4) I get rather > unexpectedly low counts for the latter. The problem goes away > when using qnorm(runif(1000000)).That is the algorithm used by default internally (since 1.7.0, mid-2003, it seems).> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 3.5e-05 [1] 2.8e-05> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 3.5e-05 [1] 3.5e-05> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 2.7e-05 [1] 2.9e-05> x <- rnorm(1000000); mean(x > 4); mean(x < -4)[1] 4.3e-05 [1] 3.7e-05 looks about right. As does cnt <- 0 for(i in 1:100) {x <- rnorm(1000000); cnt <- cnt + sum(x < -4)} cnt [1] 3157 which is (to a good approximation) Poisson(3167)> ppois(3157, 1e8*pnorm(-4))[1] 0.4332376 or more exactly> pbinom(3157, 1e8, pnorm(-4))[1] 0.4332365 However, this is not a good way to generate such random events, and has many times tripped up people: see the cover of my 1987 simulation book for the theory behind one famous (and published) instance. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
I was not able to reproduce the behavior reported in my message below. I was trying it on the same laptop in the same workspace using again R 2.0.0, but the low counts of 4 to 7 (witnesses by some of my colleagues) did not show up again. Please do not spend any more effort on this.>I am wondering whether there is a bug in rnorm. >When generating rnorm(1000000) and counting >the cases > 4 and the cases < (-4) I get rather >unexpectedly low counts for the latter. The problem goes away >when using qnorm(runif(1000000)).Fritz Scholz, PhD Applied Statistics Group Boeing Phantom Works fritz.scholz at pss.boeing.com 425-865-3623 Tu/We 206-542-6545 (most likely)