thr3ads.net - R devel - [Rd] uniform sampling without replacement algorithm [Oct 2017]

If this information is useful, please help other people find it:
Share via:

Pavel S. Ruzankin

2017-Oct-17 17:55 UTC

[Rd] uniform sampling without replacement algorithm

Let us consider the current uniform sampling without replacement 
algorithm. It resides in function do_sample in
https://svn.r-project.org/R/trunk/src/main/random.c
Its complexity is obviously O(n), where the sample is selected from 
1...n, since the algorithm has to create a vector of length n. So when 
the sample size is much lesser than n, the algorithm is not effective. 
Algorithms with average complexity O(s log s), were s is the sample 
size, were described long ago. E.g. see
https://www.degruyter.com/view/j/mcma.1999.5.issue-1/mcma.1999.5.1.39/mcma.1999.5.1.39.xml
Here the Tree algorithm has complexity O(s log s). I suppose that there 
may be algorithms with complexity close to s. Is somebody planning to 
implement some more effective algorithm?

Pavel S. Ruzankin

2017-Oct-18 08:08 UTC

head link

[Rd] uniform sampling without replacement algorithm

If somebody is interested I can write the code. But somebody else has to 
add the code for handling int / long int / double cases, since I do not 
have enough experience in that.

Pavel S. Ruzankin

2017-Oct-18 13:49 UTC

head link

[Rd] uniform sampling without replacement algorithm

See also:
P. Gupta, G. P. Bhattacharjee. (1984) An efficient algorithm for random 
sampling without replacement. International Journal of Computer 
Mathematics 16:4, pages 201-209.
http://dx.doi.org/10.1080/00207168408803438

Teuhola, J. and Nevalainen, O. 1982. Two efficient algorithms for random 
sampling without replacement. /IJCM/, 11(2): 127?140.
http://dx.doi.org/10.1080/00207168208803304

In the latter paper the authors claim that their algorithms have O(s) 
complexity. I doubt that this statement is correct. Is it?


	[[alternative HTML version deleted]]

William Dunlap

2017-Oct-18 14:38 UTC

head link

[Rd] uniform sampling without replacement algorithm

Splus used a similar method for sampling from "bigdata" objects.  One
problem was that sample() is used both for creating a sample and for
scrambling the order of a vector.  Scrambling the order of a big vector
wastes time.  It would be nice to be able to tell sample() that we don't
care about the order.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Oct 17, 2017 at 10:55 AM, Pavel S. Ruzankin <ruzankin at
math.nsc.ru>
wrote:
> Let us consider the current uniform sampling without replacement
> algorithm. It resides in function do_sample in
> https://svn.r-project.org/R/trunk/src/main/random.c
> Its complexity is obviously O(n), where the sample is selected from 1...n,
> since the algorithm has to create a vector of length n. So when the sample
> size is much lesser than n, the algorithm is not effective. Algorithms with
> average complexity O(s log s), were s is the sample size, were described
> long ago. E.g. see
> https://www.degruyter.com/view/j/mcma.1999.5.issue-1/mcma.
> 1999.5.1.39/mcma.1999.5.1.39.xml
> Here the Tree algorithm has complexity O(s log s). I suppose that there
> may be algorithms with complexity close to s. Is somebody planning to
> implement some more effective algorithm?
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Pavel S. Ruzankin

2017-Oct-18 14:54 UTC

head link

[Rd] uniform sampling without replacement algorithm

The binary tree algorithm does not need additional scrambling. I have 
written the R code for the algorithm in the last answer at:
https://stackoverflow.com/questions/311703/algorithm-for-sampling-without-replacement/46807110#46807110

However, the algorithm will probably be outperformed by hash table 
algorithms for relatively large sample sizes.

Maybe Matching Threads

Search for more apparently analagous threads

R devel - Oct 2017 - uniform sampling without replacement algorithm

[Rd] uniform sampling without replacement algorithm

[Rd] uniform sampling without replacement algorithm

[Rd] uniform sampling without replacement algorithm

[Rd] uniform sampling without replacement algorithm

[Rd] uniform sampling without replacement algorithm

Maybe Matching Threads