Dear all, I have been trying to understand what runif() is telling me. I am generating lots of numbers (billions and billions (wow, I''ve dreamed about saying that for many years... :-) ), for a distribution that has the following quantile function: 1 / (2 * sqrt(1 - p)) (that is, the distribution has a lower cutoff) As you can imagine, this has rather heavy upper tail. I was looking at the largest values, and it looked as if the largest values appeared again and again. Now, it wasn''t in itself that large values were strange, since I''m generating so many numbers, but that the largest were very much larger than the second largest numbers, and that exactly the same number appeared again and again. First I thought it was a bug, and I''m sorry to have wasted r-devels time with a bug report. I started running the same simulation with different RNGs and they all seem to generate numbers in "quantized states". Then, I started to look into what runif() gives, and let it print 13 digits. In the below output, I use the "Mersenne-Twister" RNG and I have generated 1e+10 numbers (100000 at a time) and I print a line if it the number is above 10000 (my dist, the left coloumn), the right coloumn are runif() the corresponding outputs. [1] 3.276800000000e+04 9.999999997672e-01 [1] 13377.479981919865 0.999999998603 [1] 1.158523750296e+04 9.999999981374e-01 [1] 1.036215143684e+04 9.999999976717e-01 [1] 1.158523750296e+04 9.999999981374e-01 [1] 1.036215143684e+04 9.999999976717e-01 [1] 1.036215143684e+04 9.999999976717e-01 [1] 13377.479981919865 0.999999998603 So, it seems that the runif() outputs are "quantized" too. The question is: What is the reason for this? I have been playing with the tought that it may be connected to the finite number representation capabilites of a computer? As I said, all the RNGs seems to have similar characteristics. If this is the case, I have no problem. It''s cosmology I''m doing, so whether a number is 32768 or 29923 is (presumably) of little importance. However, I should understand what is going on here.... :-) Finally, I may get a "loss of precision"-problem in my coding of my quantile function, but I guess it is not that much of a concern. Hope somebody can help me with this. Best, Kjetil -- Kjetil Kjernsmo Graduate astronomy-student Problems worthy of attack University of Oslo, Norway Prove their worth by hitting back E-mail: kjetikj at astro.uio.no - Piet Hein Homepage <URL:http://www.astro.uio.no/~kjetikj/> Webmaster at skepsis.no -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 25 May 2000, Kjetil Kjernsmo wrote:> Dear all, > > I have been trying to understand what runif() is telling me. > I am generating lots of numbers (billions and billions (wow, I''ve dreamed > about saying that for many years... :-) ), for a distribution that has the > following quantile function: > 1 / (2 * sqrt(1 - p)) > (that is, the distribution has a lower cutoff) > As you can imagine, this has rather heavy upper tail. I was looking at the > largest values, and it looked as if the largest values appeared again and > again. Now, it wasn''t in itself that large values were strange, since I''m > generating so many numbers, but that the largest were very much larger > than the second largest numbers, and that exactly the same number appeared > again and again. First I thought it was a bug, and I''m sorry to have > wasted r-devels time with a bug report. > > I started running the same simulation with different RNGs and they all > seem to generate numbers in "quantized states". Then, I started to look > into what runif() gives, and let it print 13 digits. > In the below output, I use the "Mersenne-Twister" RNG and I have generated > 1e+10 numbers (100000 at a time) and I print a line if it the number is > above 10000 (my dist, the left coloumn), the right coloumn are runif() > the corresponding outputs. > [1] 3.276800000000e+04 9.999999997672e-01 > [1] 13377.479981919865 0.999999998603 > [1] 1.158523750296e+04 9.999999981374e-01 > [1] 1.036215143684e+04 9.999999976717e-01 > [1] 1.158523750296e+04 9.999999981374e-01 > [1] 1.036215143684e+04 9.999999976717e-01 > [1] 1.036215143684e+04 9.999999976717e-01 > [1] 13377.479981919865 0.999999998603 > > So, it seems that the runif() outputs are "quantized" too. The question > is: What is the reason for this? > I have been playing with the tought that it may be connected to the finite > number representation capabilites of a computer? As I said, all the RNGs > seems to have similar characteristics.Yes, they are all quantized (all to ca 2^-31 or 2^-32). Reason: that''s all we can assume for integer arithmetic. That is perfectly sufficient for a *stable* procedure for using them. As I said before, I don''t think the algorithm you have for making use of them is adequate. Now, you haven''t actually told us how you are generating numbers from your distribution (and you haven''t actually defined the distribution precisely enough so that I could program it), but I guess you are using inversion. Don''t: it is not adequate for your purposes. You want to make use of several random numbers if you want behaviour in the far tail. Alternatively, you could plug in a generator that had a lower quantization. The moral is a familiar one: computer results are almost always approximate, and you always have to watch out for the effects of the approximations. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I don''t think this is a problem with the RNG. There are only 2^32 numbers (2^31 in some systems)that can be represented with a 32-bit floating point number. So, a rough back-of-the-envelope calculation gives me:> (1-.9999999975)*2^32[1] 10.73742 i.e. about 10-11 possible numbers bigger than .9999999975 which seems to be your threshold. So the results you get aren''t very surprising. # David Smith -- David M Smith <dsmith at splus.mathsoft.com> S-PLUS Product Manager, MathSoft DAPD, Seattle WA Tel: +1 (206) 283 8802 x360 Fax: +1 (206) 283 0347> -----Original Message----- > From: owner-r-help at stat.math.ethz.ch > [mailto:owner-r-help at stat.math.ethz.ch]On Behalf Of Kjetil Kjernsmo > Sent: Thursday, May 25, 2000 07:39 > To: R Help list > Subject: [R] Needed: Understading runif() output :-) > > > Dear all, > > I have been trying to understand what runif() is telling me. > I am generating lots of numbers (billions and billions (wow, > I''ve dreamed > about saying that for many years... :-) ), for a distribution > that has the > following quantile function: > 1 / (2 * sqrt(1 - p)) > (that is, the distribution has a lower cutoff) > As you can imagine, this has rather heavy upper tail. I was > looking at the > largest values, and it looked as if the largest values > appeared again and > again. Now, it wasn''t in itself that large values were > strange, since I''m > generating so many numbers, but that the largest were very much larger > than the second largest numbers, and that exactly the same > number appeared > again and again. First I thought it was a bug, and I''m sorry to have > wasted r-devels time with a bug report. > > I started running the same simulation with different RNGs and they all > seem to generate numbers in "quantized states". Then, I > started to look > into what runif() gives, and let it print 13 digits. > In the below output, I use the "Mersenne-Twister" RNG and I > have generated > 1e+10 numbers (100000 at a time) and I print a line if it the > number is > above 10000 (my dist, the left coloumn), the right coloumn are runif() > the corresponding outputs. > [1] 3.276800000000e+04 9.999999997672e-01 > [1] 13377.479981919865 0.999999998603 > [1] 1.158523750296e+04 9.999999981374e-01 > [1] 1.036215143684e+04 9.999999976717e-01 > [1] 1.158523750296e+04 9.999999981374e-01 > [1] 1.036215143684e+04 9.999999976717e-01 > [1] 1.036215143684e+04 9.999999976717e-01 > [1] 13377.479981919865 0.999999998603 > > So, it seems that the runif() outputs are "quantized" too. > The question > is: What is the reason for this? > I have been playing with the tought that it may be connected > to the finite > number representation capabilites of a computer? As I said, > all the RNGs > seems to have similar characteristics. > > If this is the case, I have no problem. It''s cosmology I''m doing, so > whether a number is 32768 or 29923 is (presumably) of little > importance. However, I should understand what is going on here.... :-) > > Finally, I may get a "loss of precision"-problem in my coding of my > quantile function, but I guess it is not that much of a concern. > > Hope somebody can help me with this. > > Best, > > Kjetil > -- > Kjetil Kjernsmo > Graduate astronomy-student Problems worthy > of attack > University of Oslo, Norway Prove their worth by > hitting back > E-mail: kjetikj at astro.uio.no - > Piet Hein > Homepage <URL:http://www.astro.uio.no/~kjetikj/> > Webmaster at skepsis.no > > > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. > -.-.-.-.-.-.-.-.- > r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 25 May 2000, Prof Brian D Ripley wrote:>Yes, they are all quantized (all to ca 2^-31 or 2^-32). Reason: that''s >all we can assume for integer arithmetic.Yes, indeed.>That is perfectly >sufficient for a *stable* procedure for using them. As I said before, I >don''t think the algorithm you have for making use of them is adequate. >Now, you haven''t actually told us how you are generating numbers from your >distribution (and you haven''t actually defined the distribution precisely >enough so that I could program it), but I guess you are using inversion. >Don''t: it is not adequate for your purposes.Aha, now I understand! Indeed, I use inversion. My (cumulative) distribution function is F(mu) = 1 - 0.25 * mu^-2, for mu >= 0.5 and 0 for mu < 0. I have inverted it, to get the 1 / (2 * sqrt(1 - p) quantile function, then I use runif(n) for the p.> You want to make use of >several random numbers if you want behaviour in the far tail. >Alternatively, you could plug in a generator that had a lower quantization. > >The moral is a familiar one: computer results are almost always >approximate, and you always have to watch out for the effects of the >approximations.Yes, it is a valuable lesson to learn. It is not that often the effects of the approximation are that glaring. I''m thinking about if I can live with this approximation in this case. I guess I should have a more thorough look into it nevertheless. My lack of training in statistics strikes again, I have based my analysis on a brief discussion in Dudewicz & Mishra''s "Modern Mathematical Statistics". If someone can recommend a text that discuss this topic, I would be very happy... Best, Kjetil -- Kjetil Kjernsmo Graduate astronomy-student Problems worthy of attack University of Oslo, Norway Prove their worth by hitting back E-mail: kjetikj at astro.uio.no - Piet Hein Homepage <URL:http://www.astro.uio.no/~kjetikj/> Webmaster at skepsis.no -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
At 06:38 25-05-00, Kjetil Kjernsmo wrote:>Dear all, > >I have been trying to understand what runif() is telling me. >I am generating lots of numbers (billions and billions (wow, I''ve dreamed >about saying that for many years... :-) ), for a distribution that has the >following quantile function: > 1 / (2 * sqrt(1 - p)) >(that is, the distribution has a lower cutoff) >As you can imagine, this has rather heavy upper tail. I was looking at the >largest values, and it looked as if the largest values appeared again and >again. Now, it wasn''t in itself that large values were strange, since I''m >generating so many numbers, but that the largest were very much larger >than the second largest numbers, and that exactly the same number appeared >again and again. First I thought it was a bug, and I''m sorry to have >wasted r-devels time with a bug report. > >I started running the same simulation with different RNGs and they all >seem to generate numbers in "quantized states". Then, I started to look >into what runif() gives, and let it print 13 digits. >In the below output, I use the "Mersenne-Twister" RNG and I have generated >1e+10 numbers (100000 at a time) and I print a line if it the number is >above 10000 (my dist, the left coloumn), the right coloumn are runif() >the corresponding outputs. >[1] 3.276800000000e+04 9.999999997672e-01 >[1] 13377.479981919865 0.999999998603 >[1] 1.158523750296e+04 9.999999981374e-01 >[1] 1.036215143684e+04 9.999999976717e-01 >[1] 1.158523750296e+04 9.999999981374e-01 >[1] 1.036215143684e+04 9.999999976717e-01 >[1] 1.036215143684e+04 9.999999976717e-01 >[1] 13377.479981919865 0.999999998603 > >So, it seems that the runif() outputs are "quantized" too. The question >is: What is the reason for this?Mersenne Twister pseudo-random numbers are based on 32-bit unsigned integers. R uses 64-bit doubles, and if you want all bits random, you need to combine two uniforms. Something like x <- (1.0 - 2^-32) * first.uniform + 2 ^ -32 * second.uniform should work. For efficiency, use the actual values of the two constants, and make first.uniform and second.uniform vectors of 1000 or more uniforms. --Ivan Frohne -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._