Bill.Venables@CMIS.CSIRO.AU
2000-Dec-17 00:44 UTC
[R] functions for subsets: a minor R programming challenge
Neils Waller asks> -----Original Message----- > From: niels Waller [mailto:niels.waller at home.com] > Sent: Sunday, 17 December 2000 8:31 > To: r-help at stat.math.ethz.ch > Subject: > > Does anyone know of an R (or S-PLUS) function for delineating > all possible combinations and permutations? > > That is, for set (1,2,3) 3_C_2 = {1,2} {1,3} {2,3} >OK, these are for combinations (sets), not permutations (sequences). The first one, subsets(), returns a matrix whose rows are all sets of size r from the set of size n given by the vector v. The second one, Subsets(), is a more involved version of the same thing but can be much more efficient if you are dealing with large sets (but not too large!) The third one, nextSubset(), is designed to be called in a loop and returns the next subset from the entire set arranged in lexicographic order. This is for the case when you are working your way through the subsets one at a time in a loop, anyway. There is, however, a serious catch. The second and third functions are S-PLUS FUNCTIONS ONLY because they make essential use of frame 0 or frame 1, entities that R quite righly eschews as mad, bad and dangerous to have. So the challenge is out for someone to write versions of them that use R "environments" to achieve the same effect but in an elegant and illumating way. The second function uses frame 1 (or frame 0) for caching partial results that can be later re-used and the third uses frame 0 for storing the current subset. I know it is possible since Doug Bates has done so already, but I have lost his code. Of course a part of the challenge is working out how they work without any hints from me in the form of documentation. :-) Bill Venables. _____________________________________________________________ subsets <- function(n, r, v = 1:n) # works in S or R if(r <= 0) vector(mode(v), 0) else if(r >= n) v[1:n] else { rbind(cbind(v[1], Recall(n-1, r-1, v[-1])), Recall(n-1, r, v[-1])) } Subsets <- function(n, r, v = 1:n, frame = 1) # works in S only if(r <= 0) vector(mode(v), 0) else if(r >= n) v[1:n] else { if(r > 1) { i1 <- paste("#", n - 1, r - 1) i2 <- paste("#", n - 1, r) if(!exists(i1)) assign(i1, as.vector(Recall(n - 1, r - 1, 1:(n - 1))), frame = frame) if(!exists(i2)) assign(i2, as.vector(Recall(n - 1, r, 1:(n - 1))), frame = frame) rbind(cbind(v[1], matrix(v[-1][get(i1)], ncol=r-1)), matrix(v[-1][get(i2)], ncol=r)) } else matrix(v[1:n], ncol = 1) } nextSubset <- function(n, r, v = 1:n) { # works in S only nam <- paste(n, r) if(exists(nam, frame = 0)) { ind <- get(nam) s <- seq(along = ind)[ind] if(s[1] == n - r + 1) { remove(nam, frame = 0) return(NULL) } k <- max(s[s < (n-r+1):n]) t <- sum(ind[k:n]) ind[k:n] <- c(F, rep(T, t), rep(F, n - k - t)) } else { ind <- c(rep(T, r), rep(F, n-r)) } assign(nam, ind, frame = 0) v[1:n][ind] } -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Douglas Bates
2000-Dec-17 17:03 UTC
[R] functions for subsets: a minor R programming challenge
Bill.Venables at CMIS.CSIRO.AU writes:> There is, however, a serious catch. The second and third functions are > S-PLUS FUNCTIONS ONLY because they make essential use of frame 0 or frame 1, > entities that R quite righly eschews as mad, bad and dangerous to have. So > the challenge is out for someone to write versions of them that use R > "environments" to achieve the same effect but in an elegant and illumating > way. The second function uses frame 1 (or frame 0) for caching partial > results that can be later re-used and the third uses frame 0 for storing the > current subset. I know it is possible since Doug Bates has done so already, > but I have lost his code.You lost my code! I'm sure that's never happened to me :-) I'll respond to the challenge of creating an R version of the second function having had the benefit of your explaining how it works during our conversations. For the benefit of others reading this, in the Subsets function Bill uses frame 1 to store intermediate results and avoid recalculating them. They are stored in frame 1 so they do not conflict with other objects in the global databases. The names constructed as the strings i1 and i2 are unlikely to occur in practice but you still want to avoid the possibility of conflict and frame 1 gives you a private namespace during the execution of the uppermost call of the function Subsets. (One top-level call of Subsets will generate many other calls of Subset through Recall.) One approach to obtaining a private namespace in R is to use an environment. An R function inherits the environment in which it is defined. Most of the time this is the global environment. Exceptions are when a function is defined within another function so the first R version of the code looks like mkSubsets <- function() # works in R only { .env <- environment() function(n, r, v = 1:n) if(r <= 0) vector(mode(v), 0) else if(r >= n) v[1:n] else { if(r > 1) { i1 <- paste("#", n - 1, r - 1) i2 <- paste("#", n - 1, r) if(!exists(i1)) assign(i1, as.vector(Recall(n - 1, r - 1, 1:(n - 1))), env = .env) if(!exists(i2)) assign(i2, as.vector(Recall(n - 1, r, 1:(n - 1))), env = .env) rbind(cbind(v[1], matrix(v[-1][get(i1)], ncol=r-1)), matrix(v[-1][get(i2)], ncol=r)) } else matrix(v[1:n], ncol = 1) } } Subsets <- mkSubsets() The function mkSubsets exists only to provide a private environment (stored under the name .env) for the definition of the function inside, which is returned as the value of mkSubsets. That's the function that I assign as Subsets. Notice that the frame argument in the calls to Recall has been replaced by env = .env so all of these assignments are taking place in the environment in which subsets was defined. If I print the Subsets function it reproduces the definition and adds <environment: 0x8a758ec> # this number may vary After calling Subsets, there are several objects with those funny names in the environment of the Subsets function> Subsets(5,3)[,1] [,2] [,3] [1,] 1 2 3 [2,] 1 2 4 [3,] 1 2 5 [4,] 1 3 4 [5,] 1 3 5 [6,] 1 4 5 [7,] 2 3 4 [8,] 2 3 5 [9,] 2 4 5 [10,] 3 4 5> objects(env = environment(Subsets))[1] "# 2 1" "# 2 2" "# 3 1" "# 3 2" "# 3 3" "# 4 2" "# 4 3" That is one of the differences between the S version and the R version. The frame 1 used in S disappears after the function returns whereas the environment for the R function persists until the end of the session. The R function will get faster with each execution because the intermediate results are still in the cache but it does this at the expense of taking up more memory. Because the function mkSubsets is used only to create the function Subsets, the whole thing could be changed into an anonymous call. Subsets <- (function() # works in R only { .env <- environment() function(n, r, v = 1:n) if(r <= 0) vector(mode(v), 0) else if(r >= n) v[1:n] else { if(r > 1) { i1 <- paste("#", n - 1, r - 1) i2 <- paste("#", n - 1, r) if(!exists(i1)) assign(i1, as.vector(Recall(n - 1, r - 1, 1:(n - 1))), env = .env) if(!exists(i2)) assign(i2, as.vector(Recall(n - 1, r, 1:(n - 1))), env = .env) rbind(cbind(v[1], matrix(v[-1][get(i1)], ncol=r-1)), matrix(v[-1][get(i2)], ncol=r)) } else matrix(v[1:n], ncol = 1) } })() -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._