I couldnt find a previous posting on this in the archives, maybe it has already been mentioned. If you use a calculation to generate n observations in random number generators and you don't round to the nearest integer you may be generating n-1 numbers not n numbers as you thought depending on the storage precision of the calculation. e.g.> m <- 1000 > pi0 <- 0.9 > length(rnorm(m * (1-pi0)))[1] 99 # Should be 100> options(digits=16) > m * (1-pi0)[1] 99.99999999999997> identical(m*(1-pi0), 100)[1] FALSE Random number generation generates the floor of n observations, this feature occurs on R-1.8.1 on linux Redhat8, and winXP (also on Unix SPlus 3.4) for probably all of the random number generators. e.g.> length(rnorm(m*(1-pi0),mean=0,sd=1))[1] 99> length(rpois(m*(1-pi0),lambda=1))[1] 99> length(rbeta(m*(1-pi0),shape=1, shape2=2))[1] 99> length(rbinom(m*(1-pi0),size=1, prob=0.5))[1] 99> length(runif(m*(1-pi0),min=0, max=1))[1] 99 marcus Marcus Davy Bioinformatics ______________________________________________________ The contents of this e-mail are privileged and/or confidenti...{{dropped}}
"Marcus Davy" <MDavy at hortresearch.co.nz> writes:> I couldnt find a previous posting on this in the archives, maybe it has > already been mentioned. > > If you use a calculation to generate n observations in random number > generators and you don't round to the nearest integer you may be > generating n-1 numbers not n numbers as you thought depending on the > storage precision of the calculation. > > e.g. > > m <- 1000 > > pi0 <- 0.9 > > length(rnorm(m * (1-pi0))) > [1] 99 # Should be 100 > > options(digits=16) > > m * (1-pi0) > [1] 99.99999999999997 > > identical(m*(1-pi0), 100) > [1] FALSE > > Random number generation generates the floor of n observations, this > feature occurs on R-1.8.1 on linux Redhat8, and winXP (also on Unix > SPlus 3.4) > for probably all of the random number generators.Nothing to do with random number generation, everything to do with coercion of floats to integers. Hence, e.g.> as.integer(100*(1-0.9))[1] 9> numeric(100*(1-0.9))[1] 0 0 0 0 0 0 0 0 0 There are a couple of places where we do have hidden fuzz factors because people were getting bitten by imprecision effects a bit too easily, like in> 1:(100*(1-0.9))[1] 1 2 3 4 5 6 7 8 9 10 but in general we use the standard C rules and simply truncate. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907