(Ted Harding)
2010-Sep-04 20:54 UTC
[R] How to generate integers from uniform distribution with
On 04-Sep-10 19:27:54, Yi wrote:> Enh, I see. > It totally makes sense. > Thank you for your perfect explanation. > Enjoy the long weekend~ > YiYou're welcome! Earlier I tried an experiment with rejection sampling, which seems to work well for the case where you want mean of the sampled values to exactly be the mean of the range being sampled from. The number of tries, even for a large sample, was lower than I had anticipated. Example (sample size = 20000, sampled range is {-3,-2,-1,0,1,2,3}, mean = 0 therefore require that sum of sample = 0): S <- (-10) ; n <- 0 P <- ((-3):3) while((S != 0)){ x <- sample(P,20000,replace=TRUE,prob=c(1,1,1,1,1,1,1)) S <- sum(x) ; n <- (n+1) } n hist(x) To get your case of sampling the integers from (17:23) with sample mean always exactly 20, simply add 20 to the result x of the above loop. I found that I got values of n like: 126, 43, 403, 811, 385, 568, 590, 1758, 317, 456, 643, ... with every run being completed well within 2 seconds. Conditioning the value of the sum to be equal to the central value (0) of the range is also conditioning the value to be equal to the most probable value of the sum, so the runs will on average be shortest. Conditioning on a different mean (say mean = +1 for a sample of size 20000, so sum = 20000) would take much much longer (see below). One can use the Normal approximation to the distribution of the sum to estimate how long it might take. One value sampled from ((-3):3) has mean 0 and variance 4.66.. , so the sum of 20000 has mean 0 and variance 93333.33, hence the probability that the sum will be 0 is approximated by pnorm(0.5,0,sqrt(93333.33)) - pnorm(-0.5,0,sqrt(93333.33)) = 0.001305845 so the probability of success with one sample of 20000 is about 1/766 (which is consistent with the above results for n). On the other hand, conditioning on the mean of x being 1, i.e. on the sum being 20000, the chance of success is pnorm(20000.5,0,sqrt(93333.33)) - pnorm(19999.5,0,sqrt(93333.33)) which R computes as zero! Hence you have practically no chance of achieving this within any reasonable time. However, of course, the SE of the mean is sqrt((sum(P^2)/6)/20000) = 0.01527525, so you are aiming at a point which is about 60 SEs from the mean. The numbers are more reasonable if, instead of conditioning on the mean, you condition on the sum (not too far from 0), e.g. with sample size 20000 as before: 1 Sum must be 50, Prob(success) pnorm(50.5,0,sqrt(93333.33)) - pnorm(49.5,0,sqrt(93333.33)) = 0.001288472 ~= 1/776 2 Sum must be 100, Prob(success) pnorm(100.5,0,sqrt(93333.33)) - pnorm(99.5,0,sqrt(93333.33)) = 0.001237729 ~= 1/808 3 Sum must be 200, Prob(success) pnorm(200.5,0,sqrt(93333.33)) - pnorm(199.5,0,sqrt(93333.33)) = 0.001053971 ~= 1/949 4 Sum must be 500, Prob(success) pnorm(500.5,0,sqrt(93333.33)) - pnorm(499.5,0,sqrt(93333.33)) = 0.0003421745 ~= 1/2922 and so on. So even aiming at 500 it would on average only take about 3000 tries to hit it. After that it rapidly becomes less likely. Ted. On 04-Sep-10 19:27:54, Yi wrote:> Enh, I see. > It totally makes sense. > Thank you for your perfect explanation. > Enjoy the long weekend~ > Yi-------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 04-Sep-10 Time: 21:53:58 ------------------------------ XFMail ------------------------------