Balaji S. Srinivasan
2006-Oct-19 13:05 UTC
[Rd] arraytake for extracting subarrays from multidimensional arrays
Hi, I recently encountered a problem with array subsetting and came up with a fix. Given an array of arbitrary dimensions, in which the number of dimensions is only known at runtime, I wanted to extract a subarray. The main issue with doing this is that in order to extract a subarray from an array of (say) 4 dimensions you usually specify something like this a.subarray <- a[,c(4,2),1:5,] However, if your code needs to handle an array with an arbitrary number of dimensions then you can't hard code the number of commas while writing the code. (Regarding motivation, the reason this came up is because I wanted to do some toy problems involving conditioning on multiple variables in a multidimensional joint pmf.) I looked through commands like slice.index and so on, but they seemed to require reshaping and big logical matrix intermediates, which were not memory efficient enough for me. apltake in the magic package was the closest but it only allowed subsetting of contiguous indices from either the first or last element in any given dimension. It was certainly possible to call apltake multiple times to extract arbitrary subarrays via combinations of index intervals for each dimension, and then combine them with abind as necessary, but this did not seem elegant. Anyway, I then decided to simply generate code with parse and eval. I found this post by Henrik Bengtsson which had the same idea: http://tolstoy.newcastle.edu.au/R/devel/05/11/3266.html I just took that code one step further and put together a utility function that I think might be fairly useful. I haven't completely robustified it against all kinds of pathological inputs, but if there is any interest from the development team it would be nice to add an error-checked version of this to R (or I guess I could keep it in a package). Simple usage example: ------> source("arraytake.R") > a <- array(1:24,c(2,3,4))> a[,1:3,c(4,2)] ##This invocation requires hard coding the number ofdimensions of a , , 1 [,1] [,2] [,3] [1,] 19 21 23 [2,] 20 22 24 , , 2 [,1] [,2] [,3] [1,] 7 9 11 [2,] 8 10 12> arraytake(a,list(NULL,1:3,c(4,2))) ##This invocation does not, andproduces the same result , , 1 [,1] [,2] [,3] [1,] 19 21 23 [2,] 20 22 24 , , 2 [,1] [,2] [,3] [1,] 7 9 11 [2,] 8 10 12 Code below: -------- arraytake <- function(x,indlist) { #Returns subarrays of arbitrary dimensioned arrays #1) Let x be a multidimensional array with an arbitrary number of dimensions. #2) Let indlist be a list of vectors. The length of indlist is the same as the number of #dimensions in x. Each element of the indlist is a vector which specifies which #indexes to extract in the corresponding dimension. If the element of the indlist is #NULL, then we return all elements in that dimension. #The main way this works is by programmatically building up a comma separated argument to "[" as a string #and then simply evaluating that expression. This way one does not need to specify the number of #commas. if(length(dim(x)) != length(indlist)) { return(); #we would put some error message here in production code } #First build up a string w/ indices for each dimension d <- length(indlist); #number of dims indvecstr <- matrix(0,d,1); for(i in 1:d) { if(is.null(indlist[[i]])) { indvecstr[i] <- ""; } else{ indvecstr[i] <- paste("c(",paste(indlist[[i]],sep="",collapse=","),")",sep="") } } #Then build up the argument string to "[" argstr <- paste(indvecstr,sep="",collapse=",") argstr <- paste("x[",argstr,"]",sep="") #Finally, return the subsetted array return(eval(parse(text=argstr))) } -- Dr. Balaji S. Srinivasan Stanford University Depts. of Statistics and Computer Science 318 Campus Drive, Clark Center S251 (650) 380-0695 balajis@stanford.edu http://jinome.stanford.edu [[alternative HTML version deleted]]
Robin Hankin
2006-Oct-19 13:24 UTC
[Rd] arraytake for extracting subarrays from multidimensional arrays
Hi Your arraytake() function does indeed do something that can't be done elegantly by apltake(), AFAICS I think that arraytake() would make a splendid addition to the magic package. Would that be acceptable? best wishes rksh [I can't help thinking that a judicious use of do.call() could replace the eval(parse(text=...)) construction tho' . . .] does more-or-less this. On 19 Oct 2006, at 14:05, Balaji S. Srinivasan wrote:> Hi, > > I recently encountered a problem with array subsetting and came up > with a > fix. Given an array of arbitrary dimensions, in which the number of > dimensions is only known at runtime, I wanted to extract a > subarray. The > main issue with doing this is that in order to extract a subarray > from an > array of (say) 4 dimensions you usually specify something like this > > a.subarray <- a[,c(4,2),1:5,] > > However, if your code needs to handle an array with an arbitrary > number of > dimensions then you can't hard code the number of commas while > writing the > code. (Regarding motivation, the reason this came up is because I > wanted to > do some toy problems involving conditioning on multiple variables in a > multidimensional joint pmf.) > > I looked through commands like slice.index and so on, but they > seemed to > require reshaping and big logical matrix intermediates, which were not > memory efficient enough for me. apltake in the magic package was > the closest > but it only allowed subsetting of contiguous indices from either > the first > or last element in any given dimension. It was certainly possible > to call > apltake multiple times to extract arbitrary subarrays via > combinations of > index intervals for each dimension, and then combine them with > abind as > necessary, but this did not seem elegant. > > Anyway, I then decided to simply generate code with parse and eval. > I found > this post by Henrik Bengtsson which had the same idea: > > http://tolstoy.newcastle.edu.au/R/devel/05/11/3266.html > > I just took that code one step further and put together a utility > function > that I think might be fairly useful. I haven't completely > robustified it > against all kinds of pathological inputs, but if there is any > interest from > the development team it would be nice to add an error-checked > version of > this to R (or I guess I could keep it in a package). > > > Simple usage example: > ------ >> source("arraytake.R") >> a <- array(1:24,c(2,3,4)) > >> a[,1:3,c(4,2)] ##This invocation requires hard coding the number of > dimensions of a > , , 1 > > [,1] [,2] [,3] > [1,] 19 21 23 > [2,] 20 22 24 > > , , 2 > > [,1] [,2] [,3] > [1,] 7 9 11 > [2,] 8 10 12 > > >> arraytake(a,list(NULL,1:3,c(4,2))) ##This invocation does not, and > produces the same result > , , 1 > > [,1] [,2] [,3] > [1,] 19 21 23 > [2,] 20 22 24 > > , , 2 > > [,1] [,2] [,3] > [1,] 7 9 11 > [2,] 8 10 12 > > > > Code below: > -------- > arraytake <- function(x,indlist) { > > #Returns subarrays of arbitrary dimensioned arrays > #1) Let x be a multidimensional array with an arbitrary number of > dimensions. > #2) Let indlist be a list of vectors. The length of indlist is > the same as > the number of > #dimensions in x. Each element of the indlist is a vector which > specifies > which > #indexes to extract in the corresponding dimension. If the > element of the > indlist is > #NULL, then we return all elements in that dimension. > > #The main way this works is by programmatically building up a comma > separated argument to "[" as a string > #and then simply evaluating that expression. This way one does > not need to > specify the number of > #commas. > > if(length(dim(x)) != length(indlist)) { > return(); #we would put some error message here in production > code > } > > #First build up a string w/ indices for each dimension > d <- length(indlist); #number of dims > indvecstr <- matrix(0,d,1); > for(i in 1:d) { > if(is.null(indlist[[i]])) { > indvecstr[i] <- ""; > } else{ > indvecstr[i] <- > paste("c(",paste(indlist[[i]],sep="",collapse=","),")",sep="") > } > } > > #Then build up the argument string to "[" > argstr <- paste(indvecstr,sep="",collapse=",") > argstr <- paste("x[",argstr,"]",sep="") > > #Finally, return the subsetted array > return(eval(parse(text=argstr))) > } > > > > > > > > -- > Dr. Balaji S. Srinivasan > Stanford University > Depts. of Statistics and Computer Science > 318 Campus Drive, Clark Center S251 > (650) 380-0695 > balajis at stanford.edu > http://jinome.stanford.edu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743
Gabor Grothendieck
2006-Oct-19 13:26 UTC
[Rd] arraytake for extracting subarrays from multidimensional arrays
Note that it can also be done like with do.call: a <- array(1:24, 2:4) L <- list(TRUE, 1:3, c(4, 2)) do.call("[", c(list(a), L)) On 10/19/06, Balaji S. Srinivasan <balajis at stanford.edu> wrote:> Hi, > > I recently encountered a problem with array subsetting and came up with a > fix. Given an array of arbitrary dimensions, in which the number of > dimensions is only known at runtime, I wanted to extract a subarray. The > main issue with doing this is that in order to extract a subarray from an > array of (say) 4 dimensions you usually specify something like this > > a.subarray <- a[,c(4,2),1:5,] > > However, if your code needs to handle an array with an arbitrary number of > dimensions then you can't hard code the number of commas while writing the > code. (Regarding motivation, the reason this came up is because I wanted to > do some toy problems involving conditioning on multiple variables in a > multidimensional joint pmf.) > > I looked through commands like slice.index and so on, but they seemed to > require reshaping and big logical matrix intermediates, which were not > memory efficient enough for me. apltake in the magic package was the closest > but it only allowed subsetting of contiguous indices from either the first > or last element in any given dimension. It was certainly possible to call > apltake multiple times to extract arbitrary subarrays via combinations of > index intervals for each dimension, and then combine them with abind as > necessary, but this did not seem elegant. > > Anyway, I then decided to simply generate code with parse and eval. I found > this post by Henrik Bengtsson which had the same idea: > > http://tolstoy.newcastle.edu.au/R/devel/05/11/3266.html > > I just took that code one step further and put together a utility function > that I think might be fairly useful. I haven't completely robustified it > against all kinds of pathological inputs, but if there is any interest from > the development team it would be nice to add an error-checked version of > this to R (or I guess I could keep it in a package). > > > Simple usage example: > ------ > > source("arraytake.R") > > a <- array(1:24,c(2,3,4)) > > > a[,1:3,c(4,2)] ##This invocation requires hard coding the number of > dimensions of a > , , 1 > > [,1] [,2] [,3] > [1,] 19 21 23 > [2,] 20 22 24 > > , , 2 > > [,1] [,2] [,3] > [1,] 7 9 11 > [2,] 8 10 12 > > > > arraytake(a,list(NULL,1:3,c(4,2))) ##This invocation does not, and > produces the same result > , , 1 > > [,1] [,2] [,3] > [1,] 19 21 23 > [2,] 20 22 24 > > , , 2 > > [,1] [,2] [,3] > [1,] 7 9 11 > [2,] 8 10 12 > > > > Code below: > -------- > arraytake <- function(x,indlist) { > > #Returns subarrays of arbitrary dimensioned arrays > #1) Let x be a multidimensional array with an arbitrary number of > dimensions. > #2) Let indlist be a list of vectors. The length of indlist is the same as > the number of > #dimensions in x. Each element of the indlist is a vector which specifies > which > #indexes to extract in the corresponding dimension. If the element of the > indlist is > #NULL, then we return all elements in that dimension. > > #The main way this works is by programmatically building up a comma > separated argument to "[" as a string > #and then simply evaluating that expression. This way one does not need to > specify the number of > #commas. > > if(length(dim(x)) != length(indlist)) { > return(); #we would put some error message here in production code > } > > #First build up a string w/ indices for each dimension > d <- length(indlist); #number of dims > indvecstr <- matrix(0,d,1); > for(i in 1:d) { > if(is.null(indlist[[i]])) { > indvecstr[i] <- ""; > } else{ > indvecstr[i] <- > paste("c(",paste(indlist[[i]],sep="",collapse=","),")",sep="") > } > } > > #Then build up the argument string to "[" > argstr <- paste(indvecstr,sep="",collapse=",") > argstr <- paste("x[",argstr,"]",sep="") > > #Finally, return the subsetted array > return(eval(parse(text=argstr))) > } > > > > > > > > -- > Dr. Balaji S. Srinivasan > Stanford University > Depts. of Statistics and Computer Science > 318 Campus Drive, Clark Center S251 > (650) 380-0695 > balajis at stanford.edu > http://jinome.stanford.edu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >