Nadeem Shafique
2008-May-26 04:18 UTC
[Rd] Listing all possible samples of Size two form Large Population
Respected All, I need some efficient program or package to draw all possible samples of size two without replacement. I am using "combinat" package to list all possible samples but it hangs my computer for larger populations say 10,000 (i.e. 49995000 all possible samples). I wish to even work for larger populations then this and replicate this procedure for many times. Kindly can anyone figure out the possibilities and let me know. Best Regards, Nadeem Shafique Butt
Ben Bolker
2008-May-30 16:11 UTC
[Rd] Listing all possible samples of Size two form Large Population
Nadeem Shafique <nadeemshafique <at> gmail.com> writes:> > Respected All, > > I need some efficient program or package to draw all possible samples > of size two without replacement. I am using "combinat" package to list > all possible samples but it hangs my computer for larger populations > say 10,000 (i.e. 49995000 all possible samples). I wish to even work > for larger populations then this and replicate this procedure for many > times. Kindly can anyone figure out the possibilities and let me know. >50 million samples sounds like a lot already -- hope you have a lot of memory (and I am tempted to wonder what you're going to find out that a random subsample wouldn't tell you ...) object.size(numeric(5e7))/2^20 [1] 381.4697 -- already 381MB (although maybe you have a lot of memory), and you have to double that to hold both elements of the combination. The algorithm for enumerating these samples by brute force is pretty easy -- for (i in 2:N) { for (j in 1:(i-1)) { cat(i,j,"\n") } } -- but of course these loops will be really slow for large N. There may (?) be a way to do this in a vectorized fashion (the only quick and dirty ways I can think of doing this involve creating the whole sample and then cutting it down, which is probably not worth the time, e.g.> N = 10 > i=1:N > j=1:N > e=expand.grid(i,j) > m=matrix(1:nrow(e),nrow=N) > s=e[m[lower.tri(m)],]I would create a little snippet of C code to do this. You could also look at the inline package (on CRAN) or Ra and the jit package, although both of these are more experimental than just writing the C code, compiling it, and linking it in. Bottom line: this should be possible, but I don't know of a package that does it automatically, and if I were you I would think seriously about what question you really want to answer and whether there's a less brute-force way of doing it. cheers Ben Bolker