thr3ads.net - R help - [R] expand.grid without expanding [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Luís Torgo

2006-Feb-08 18:08 UTC

[R] expand.grid without expanding

Dear list,
I've recently came across a problem that I think I've solved and that I
wanted
to share with you for two reasons:
- Maybe others come across the same problem.
- Maybe someone has a much simpler solution that wants to share with me ;-)

The problem is as follows: expand.grid() allows you to generate a data.frame 
with all combinations of a set of values, e.g.:> expand.grid(par1=-1:1,par2=c('a','b'))  par1 par2
1   -1    a
2    0    a
3    1    a
4   -1    b
5    0    b
6    1    b

There is nothing wrong with this nice function except when you have too many 
combinations to fit in your computer memory, and that was my problem: I 
wanted to do something for each combination of a set of variants, but this 
set was to large for storing in memory in a data.frame generated by 
expand.grid. A possible solution would be to have a set of nested for() 
cycles but I preferred a solution that involved a single for() cycle going 
from 1 to the number of combinations and then at each iteration having some 
form of generating the combination "i". And this was the "real
problem": how
to generate a function that picks the same style of arguments as 
expand.grid() and provides me with the values corresponding to line
"i" of
the data frame that would have been created bu expand.grid(). For instance, 
if I wanted the line 4 of the above call to expand.grid() I should get the 
same as doing:> expand.grid(par1=-1:1,par2=c('a','b'))[4,]  par1 par2
4   -1    b

but obviously without having to use expand.grid() as that involves generating 
a data frame that in my case wouldn't fit in the memory of my computer.

Now, the function I've created was the following:
--------------------------------------------
getVariant <- function(id,vars) {

  if (!is.list(vars)) stop('vars needs to be a list!')

  nv <- length(vars)

  lims <- sapply(vars,length)
  if (id > prod(lims)) stop('id above the number of combinations!')
  
  res <- vector("list",nv)

  for(i in nv:2) {

    f <- prod(lims[1:(i-1)])
    
    res[[i]] <- vars[[i]][ceiling(id / f)]

    id <- id - (ceiling(id/f)-1)*f
  }

  res[[1]] <- vars[[1]][id]
  names(res) <- names(vars)
  res

}
--------------------------------------> expand.grid(par1=-1:1,par2=c('a','b'))[4,]  par1 par2
4   -1    b> getVariant(4,list(par1=-1:1,par2=c('a','b')))$par1
[1] -1

$par2
[1] "b"

I would be glad to know if somebody came across the same problem and has a 
better suggestion on how to solve this.

Thanks,
Luis

-- 
Luis Torgo
    FEP/LIACC, University of Porto   Phone : (+351) 22 339 20 93
    Machine Learning Group           Fax   : (+351) 22 339 20 99
    R. de Ceuta, 118, 6o             email : ltorgo at liacc.up.pt
    4050-190 PORTO - PORTUGAL        WWW   : http://www.liacc.up.pt/~ltorgo

Ray Brownrigg

2006-Feb-08 21:34 UTC

head link

[R] expand.grid without expanding

> From: =?iso-8859-1?q?Lu=EDs_Torgo?= <ltorgo at liacc.up.pt>
> Date: Wed, 8 Feb 2006 18:08:40 +0000
> 
> Dear list,
> I've recently came across a problem that I think I've solved and
that I wanted
> to share with you for two reasons:
> - Maybe others come across the same problem.
> - Maybe someone has a much simpler solution that wants to share with me ;-)
> 
> The problem is as follows: expand.grid() allows you to generate a
data.frame
> with all combinations of a set of values, e.g.:
> > expand.grid(par1=-1:1,par2=c('a','b'))
>   par1 par2
> 1   -1    a
> 2    0    a
> 3    1    a
> 4   -1    b
> 5    0    b
> 6    1    b
> 
> There is nothing wrong with this nice function except when you have too
many
> combinations to fit in your computer memory, and that was my problem: I 
> wanted to do something for each combination of a set of variants, but this 
> set was to large for storing in memory in a data.frame generated by 
> expand.grid. A possible solution would be to have a set of nested for() 
> cycles but I preferred a solution that involved a single for() cycle going 
> from 1 to the number of combinations and then at each iteration having some
> form of generating the combination "i". And this was the
"real problem": how
> to generate a function that picks the same style of arguments as 
> expand.grid() and provides me with the values corresponding to line
"i" of
> the data frame that would have been created bu expand.grid(). For instance,
> if I wanted the line 4 of the above call to expand.grid() I should get the 
> same as doing:
> > expand.grid(par1=-1:1,par2=c('a','b'))[4,]
>   par1 par2
> 4   -1    b
> 
> but obviously without having to use expand.grid() as that involves
generating
> a data frame that in my case wouldn't fit in the memory of my computer.
> 
> Now, the function I've created was the following:
> --------------------------------------------
> getVariant <- function(id,vars) {
>   if (!is.list(vars)) stop('vars needs to be a list!')
>   nv <- length(vars)
>   lims <- sapply(vars,length)
>   if (id > prod(lims)) stop('id above the number of
combinations!')
>   res <- vector("list",nv)
>   for(i in nv:2) {
>     f <- prod(lims[1:(i-1)])
>     res[[i]] <- vars[[i]][ceiling(id / f)]
>     id <- id - (ceiling(id/f)-1)*f
>   }
>   res[[1]] <- vars[[1]][id]
>   names(res) <- names(vars)
>   res
> }
> --------------------------------------
> > expand.grid(par1=-1:1,par2=c('a','b'))[4,]
>   par1 par2
> 4   -1    b
> > getVariant(4,list(par1=-1:1,par2=c('a','b')))
> $par1
> [1] -1
> 
> $par2
> [1] "b"
> 
> I would be glad to know if somebody came across the same problem and has a 
> better suggestion on how to solve this.
> A few minor improvements:
1) let id be a vector of indices
2) use %% and %/% instead of ceiling (perhaps debateable)
3) return a data frame as does expand.grid

So your function now looks like:

getVariant <- function(id, vars) {
  if (!is.list(vars)) stop('vars needs to be a list!')
  nv <- length(vars)
  lims <- sapply(vars, length)
  if (any(id > prod(lims))) stop('id above the number of
combinations!')
  res <- vector("list", nv)
  for(i in nv:2) {
    f <- prod(lims[1:(i-1)])
    res[[i]] <- vars[[i]][(id - 1)%/%f + 1]
    id <- (id - 1)%%f + 1
  }
  res[[1]] <- vars[[1]][id]
  names(res) <- names(vars)
  return(as.data.frame(res))
}

Now, for example, you get:
>
expand.grid(par1=-1:1,par2=c('a','b'),par3=c('w','x','y','z'))[12:15,]   par1 par2 par3
12    1    b    x
13   -1    a    y
14    0    a    y
15    1    a    y> getVariant(12:15,list(par1=-1:1,par2=c('a','b'),
par3=c('w','x','y','z')))  par1 par2 par3
1    1    b    x
2   -1    a    y
3    0    a    y
4    1    a    y>                              
Note that you will run into trouble when the product of the lengths is
greater than the largest representable integer on your system.

Hope this helps,
Ray Brownrigg

Possibly Parallel Threads

Search for more maybe matching threads

R help - Feb 2006 - expand.grid without expanding

[R] expand.grid without expanding

[R] expand.grid without expanding

Possibly Parallel Threads