thr3ads.net - R devel - [Rd] efficiency of sample() with prob. [Jun 2005]

If this information is useful, please help other people find it:
Share via:

Bo Peng

2005-Jun-21 16:24 UTC

[Rd] efficiency of sample() with prob.

Dear list,

A while ago, Vadim asked opinions on improving efficiency of sample()
with prob, e.g. sample with replacement with weight.  (
https://stat.ethz.ch/pipermail/r-devel/2004-September/030844.html ) He
did not post what he ended up with this problem though.

I am having exactly the same problem. I need to sample  with
replacement from a population of size 10 million with fitness values
for each individual. sample() is too slow for this purpose.

I implement a bisection search algorithm. It is about 30% faster than
the linear search one but still not good enough to me. (I can post the
function if needed). Does anybody have some good ideas? The only other
idea I have is using a faster (but worse) random number generator just
for this application.

Thanks.
Bo

Vadim Ogranovich

2005-Jun-21 17:53 UTC

head link

[Rd] efficiency of sample() with prob.

In his "Introduction to Probability Models" Sheldon Ross describes
(sec
11.4.1, 8th edition) the alias method for such weighted sampling.
It is based on some decomposition of the original distribution (the
weights) into a mixture of two-point distributions. I don't know the
run-time complexity of the decomposition step. But once the
decomposition is found the complexity of generating a sample is linear
in the size of the sample.

I haven't coded this because I had found an easier way to deal with the
original problem I had at that time.

HTH,
Vadim
> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Bo Peng
> Sent: Tuesday, June 21, 2005 9:24 AM
> To: r-devel at r-project.org
> Subject: [Rd] efficiency of sample() with prob.
> 
> Dear list,
> 
> A while ago, Vadim asked opinions on improving efficiency of 
> sample() with prob, e.g. sample with replacement with weight. 
>  ( 
> https://stat.ethz.ch/pipermail/r-devel/2004-September/030844.html ) He did not post what he ended up with this problem
though.> 
> I am having exactly the same problem. I need to sample  with 
> replacement from a population of size 10 million with fitness 
> values for each individual. sample() is too slow for this purpose.
> 
> I implement a bisection search algorithm. It is about 30% 
> faster than the linear search one but still not good enough 
> to me. (I can post the function if needed). Does anybody have 
> some good ideas? The only other idea I have is using a faster 
> (but worse) random number generator just for this application.
> 
> Thanks.
> Bo
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Bo Peng

2005-Aug-30 13:44 UTC

head link

[Rd] efficiency of sample() with prob.

> You chose to report just one extremely favourable example, < ignore>
> I do think you are being `careless with the truth'.
I chose to report whatever I got and whatever I felt the result was. It was not
a scientific report and it was up to you (the R-team) to validate my result and
make further investigations. When I was asked for a more thorough research
in this field (and thus take the responsibility to my results), I said
no since I did
not have enough expertise and time. 

After all, I could have chosen not to report anything at all. If this
mailing list
only accepts serious contributions from field professionals, it is time for me 
to quit.
> Hmm.  A sample every 100ns and generating samples several times faster
> than tabulating them is something to be impressed by. <ignore>
> Not in our tests.  Did you try my 5 out of 10,000 non-zero probablilties
> distribution, for example?
No. I did not try. Intuitively, Walker's method may be slow because  of
the memory allocation and table setup stuff. It should be used with 
large sample and small prob vector. Bisection method should be
quicker when the prob vector is large (so linear search is slow) and the 
sample size is relatively small. I thought bisection would be uniformly quicker 
than the linear search method since there is no additional cost other than 
a few more variables.

If you have done enough experiments and know the fastest method in 
each case, it should be straightforward to implement a hybrid method 
where the best method is chosen based on the lengths of sample and 
prob vector. I might have been reporting partial results, but the 80 times 
speed-up in some cases was not faked.

Cheers,
Bo

Maybe Matching Threads

Search for more maybe matching threads

R devel - Jun 2005 - efficiency of sample() with prob.

[Rd] efficiency of sample() with prob.

[Rd] efficiency of sample() with prob.

[Rd] efficiency of sample() with prob.

Maybe Matching Threads