This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. ---559023410-851401618-1218751024=:15885 Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE I didn't describe the problem clearly. It's about the number of distinct=20 values. So just ignore cycle issue. My tests were: RNGkind(kind=3D"Knuth-TAOCP"); sum(duplicated(runif(1e7))); #return 46552 RNGkind(kind=3D"Knuth-TAOCP-2002"); sum(duplicated(runif(1e7))); #return 46415 #These collision frequency suggested there were 2^30 distinct values by=20 birthday problem. RNGkind(kind=3D"Marsaglia-Multicarry"); sum(duplicated(runif(1e7))); #return 11682 RNGkind(kind=3D"Super-Duper"); sum(duplicated(runif(1e7))); #return 11542 RNGkind(kind=3D"Mersenne-Twister"); sum(duplicated(runif(1e7))); #return 11656 #These indicated there were 2^32 distinct values, which agrees with the=20 help info. RNGkind(kind=3D"Wichmann-Hill"); sum(duplicated(runif(1e7))); #return 0 #So for this method, there should be more than 2^32 distinct values. You may not get the exact numbers, but they should be close. So how to=20 explain above problem? I need generate a large sample without any ties, it seems to me=20 "Wichmann-Hill" is only choice right now. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Shengqiao Li The Department of Statistics PO Box 6330 West Virginia University Morgantown, WV 26506-6330 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D On Thu, 14 Aug 2008, Peter Dalgaard wrote:> Shengqiao Li wrote: >> Hello all, >>=20 >> I am generating large samples of random numbers. The RNG help page says:=20 >> "All the supplied uniform generators return 32-bit integer values that are=20 >> converted to doubles, so they take at most 2^32 distinct values and long=20 >> runs will return duplicated values." But I find that the cycles are not the=20 >> same as the 32-bit integer. >>=20 >> My test indicated that the cycles for Knuth's methods were 2^30 while=20 >> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated in=20 >> 10M numbers generated by runif using Wichmann-Hill. The other three methods=20 >> had cycle length of 2^32. >>=20 >> So, anybody can explain this? And any improvement to the implementation can=20 >> be made to increase the cycle length like the Wichmann-Hill method? >>=20 > What test? These are not simple linear congruential generators. Just because=20 > you get the same value twice, it doesn't mean that the sequence is repeating.=20 > Perhaps you should read the entire help page rather than just the note. > > --=20 > O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, Entr.B > c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 > >---559023410-851401618-1218751024=:15885--
Remember Wichmann-Hill is a composite generator. Its composition does take more than 2^32 distinct values. You still haven't identifed a problem here. The note is to warn that runif() does repeat within a cycle, because people wrote code assuming otherwise. It would be a poor use of runif() to rely on the low-order bits, and that's standard advice in the field. For a large sample of uniforms use something like the normal inversion does, e.g. 2^(-30) * (runif(N, 0, 2^30) %% 2^30 + runif(N)) Please do leave R-bugs out of this: we already have 4 entries as a result of your misunderstandings and false claims. On Thu, 14 Aug 2008, shli at stat.wvu.edu wrote:> This message is in MIME format. The first part should be readable text, > while the remaining parts are likely unreadable without MIME-aware tools. > > ---559023410-851401618-1218751024=:15885 > Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed > Content-Transfer-Encoding: QUOTED-PRINTABLE > > > I didn't describe the problem clearly. It's about the number of distinct=20 > values. So just ignore cycle issue. > > My tests were: > > RNGkind(kind=3D"Knuth-TAOCP"); > sum(duplicated(runif(1e7))); #return 46552 > > RNGkind(kind=3D"Knuth-TAOCP-2002"); > sum(duplicated(runif(1e7))); #return 46415 > > #These collision frequency suggested there were 2^30 distinct values by=20 > birthday problem. > > > RNGkind(kind=3D"Marsaglia-Multicarry"); > sum(duplicated(runif(1e7))); #return 11682 > > RNGkind(kind=3D"Super-Duper"); > sum(duplicated(runif(1e7))); #return 11542 > > RNGkind(kind=3D"Mersenne-Twister"); > sum(duplicated(runif(1e7))); #return 11656 > > #These indicated there were 2^32 distinct values, which agrees with the=20 > help info. > > RNGkind(kind=3D"Wichmann-Hill"); > sum(duplicated(runif(1e7))); #return 0 > > #So for this method, there should be more than 2^32 distinct values. > > You may not get the exact numbers, but they should be close. So how to=20 > explain above problem? > > I need generate a large sample without any ties, it seems to me=20 > "Wichmann-Hill" is only choice right now. > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Shengqiao Li > > The Department of Statistics > PO Box 6330 > West Virginia University > Morgantown, WV 26506-6330 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > On Thu, 14 Aug 2008, Peter Dalgaard wrote: > >> Shengqiao Li wrote: >>> Hello all, >>> =20 >>> I am generating large samples of random numbers. The RNG help page says:> =20 >>> "All the supplied uniform generators return 32-bit integer values that a> re=20 >>> converted to doubles, so they take at most 2^32 distinct values and long> =20 >>> runs will return duplicated values." But I find that the cycles are not > the=20 >>> same as the 32-bit integer. >>> =20 >>> My test indicated that the cycles for Knuth's methods were 2^30 while=20 >>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated i> n=20 >>> 10M numbers generated by runif using Wichmann-Hill. The other three meth> ods=20 >>> had cycle length of 2^32. >>> =20 >>> So, anybody can explain this? And any improvement to the implementation > can=20 >>> be made to increase the cycle length like the Wichmann-Hill method? >>> =20 >> What test? These are not simple linear congruential generators. Just beca> use=20 >> you get the same value twice, it doesn't mean that the sequence is repeat> ing.=20 >> Perhaps you should read the entire help page rather than just the note. >> >> --=20 >> O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, Entr.B >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K >> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 >> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 >> >> > ---559023410-851401618-1218751024=:15885-- > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
shli at stat.wvu.edu wrote:> This message is in MIME format. The first part should be readable text, > while the remaining parts are likely unreadable without MIME-aware tools. > > ---559023410-851401618-1218751024=:15885 > Content-Type: TEXT/PLAIN; charset=ISO-8859-1; format=flowed > Content-Transfer-Encoding: QUOTED-PRINTABLE > > > I didn't describe the problem clearly. It's about the number of distinct=20 > values. So just ignore cycle issue. > > My tests were: > > RNGkind(kind=3D"Knuth-TAOCP"); > sum(duplicated(runif(1e7))); #return 46552 > > RNGkind(kind=3D"Knuth-TAOCP-2002"); > sum(duplicated(runif(1e7))); #return 46415 > > #These collision frequency suggested there were 2^30 distinct values by=20 > birthday problem. >The birthday problem distribution applies to independent draws, but they are only pseudo-independent. I think the only ways to know for sure if there are 2^30 values are to look at the code, or run through a complete cycle. And you need to determine the cycle by looking at .Random.seed, not at the returned value.> > RNGkind(kind=3D"Marsaglia-Multicarry"); > sum(duplicated(runif(1e7))); #return 11682 > > RNGkind(kind=3D"Super-Duper"); > sum(duplicated(runif(1e7))); #return 11542 > > RNGkind(kind=3D"Mersenne-Twister"); > sum(duplicated(runif(1e7))); #return 11656 > > #These indicated there were 2^32 distinct values, which agrees with the=20 > help info. > >If there are 2^30 distinct values for the two generators above, that also agrees with the documentation.> RNGkind(kind=3D"Wichmann-Hill"); > sum(duplicated(runif(1e7))); #return 0 > > #So for this method, there should be more than 2^32 distinct values. > > You may not get the exact numbers, but they should be close. So how to=20 > explain above problem? >You haven't demonstrated what you claim, but if you look at the source, you'll see that in fact the man page is wrong: Wichmann-Hill is based on 3 integer values, which each take on approximately 15 bits of different values. So Wichmann-Hill could take nearly 2^45 different values (actually 30269*30307*30323). The source is in https://svn.r-project.org/R/trunk/src/main/RNG.c if you want to check the others.> I need generate a large sample without any ties, it seems to me=20 > "Wichmann-Hill" is only choice right now. >An alternative would be to construct a new value from two (or more) runif() values, but be careful that you don't mess up the distribution when you do that. Duncan Murdoch> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Shengqiao Li > > The Department of Statistics > PO Box 6330 > West Virginia University > Morgantown, WV 26506-6330 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > On Thu, 14 Aug 2008, Peter Dalgaard wrote: > > >> Shengqiao Li wrote: >> >>> Hello all, >>> =20 >>> I am generating large samples of random numbers. The RNG help page says:>>> > =20 > >>> "All the supplied uniform generators return 32-bit integer values that a>>> > re=20 > >>> converted to doubles, so they take at most 2^32 distinct values and long>>> > =20 > >>> runs will return duplicated values." But I find that the cycles are not >>> > the=20 > >>> same as the 32-bit integer. >>> =20 >>> My test indicated that the cycles for Knuth's methods were 2^30 while=20 >>> Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated i>>> > n=20 > >>> 10M numbers generated by runif using Wichmann-Hill. The other three meth>>> > ods=20 > >>> had cycle length of 2^32. >>> =20 >>> So, anybody can explain this? And any improvement to the implementation >>> > can=20 > >>> be made to increase the cycle length like the Wichmann-Hill method? >>> =20 >>> >> What test? These are not simple linear congruential generators. Just beca>> > use=20 > >> you get the same value twice, it doesn't mean that the sequence is repeat>> > ing.=20 > >> Perhaps you should read the entire help page rather than just the note. >> >> --=20 >> O__ ---- Peter Dalgaard =D8ster Farimagsgade 5, Entr.B >> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K >> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 >> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 >> >> >> > ---559023410-851401618-1218751024=:15885-- > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >