It seems people are reinventing the wheel here: The goal is to generate all combinations of 1:n of size k. This (typically) results in a matrix of size k * choose(n,k) i.e. needs O(n ^ k) space, hence is only applicable to relatively small k. Then alternatives have been devised to generate the combinations "one by one", and I think I remember there has been a quiz/challenge about 20 years ago, at an S user's conference in Australia(?), on this topic. Anyway, AFAIK, the first nice and efficient function for this has been provided by Scott Chasalow for S (not R) and he made his S code available at the University of Wageningen as "combinat" module. Later this became into an R package which is now maintained by Vince Carey. The source of 'combinat' still shows # DATE WRITTEN: 14 April 1994 LAST REVISED: 10 July 1995 # AUTHOR: Scott Chasalow OTOH, people have since reinvented the wheel quite prolifically: There's combinations() in gtools {based on Bill Venables' code from R News 1/1}, combs() in CAtools, subsets() in BHH2, and nchoosek() in vsn (bioconductor); then 'fwd.combn' in package "forward" which states explicitly that it is Scott's combn() renamed.. I stopped searching for more, and I've made sure all these 6 functions compute the same thing, at least in the most simple case. After simply replacing nCm() by choose(), and some other minor tweaks, I have now a version of combn() that is faster than all the other implementations {only slightly faster than combinations()}, and I plan to add this to R's standard package 'utils'. Hopefully, the reinventing can be stopped by this, once people can rely on a relatively fast implementation of the functionality. One might also consider to include a version of the ``one by one'' combination generators {as mentioned above} which is needed for larger k. Opinions ? Martin Maechler, ETH Zurich
Martin Maechler <maechler at stat.math.ethz.ch> writes:> tweaks, I have now a version of combn() that is faster than all > the other implementations {only slightly faster than > combinations()}, and I plan to add this to R's standard package > 'utils'. > Hopefully, the reinventing can be stopped by this, once people > can rely on a relatively fast implementation of the > functionality. > > One might also consider to include a version of the ``one by > one'' combination generators {as mentioned above} which is > needed for larger k. > > Opinions ?While you're in there... combn() has a nice feature that you can apply a function to each combination, which can be quite useful (and fun) if you're demonstrating permutation tests:> x <- combn(20,10,function(i)mean(sleep$extra[i])) > x <- round(x,2) # this is needed, or you get artifacts > plot(sort(unique(x)),table(x),type="h") > plot(sort(x),type="l") > sum(x <= mean(sleep$extra[1:10])) / length(x)[1] 0.04072398 However, combn() does its work in sapply() style: First create a list, then simplify. As this quickly becomes a rather *long* list (e.g., a slightly larger case with choose(29, 14)==77558760 combinations kills R for me on a 2GB 64 bit machine), it might be desirable to have an option, "assume.scalar" or so, to specify that the function always returns a single scalar. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907