Hans Ekbrand
2012-Feb-25 15:54 UTC
[R] which is the fastest way to make data.frame out of a three-dimensional array?
foo <- rnorm(30*34*12) dim(foo) <- c(30, 34, 12) I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame. I know how this can be done in a very slow way using for loops, like this: x <- rep(seq(from = 1, to = 30), 34) y <- as.vector(sapply(1:34, function(x) {rep(x, 30)})) month <- as.vector(sapply(1:12, function(x) {rep(x, 30*34)})) my.df <- data.frame(month, x=rep(x, 12), y=rep(y, 12), temp=rep(NA, 30*34*12)) my.counter <- 1 for(month in 1:12){ for(i in 1:34){ for(j in 1:30){ my.df$temp[my.counter] <- foo[j,i,month] my.counter <- my.counter + 1 } } } str(my.df) 'data.frame': 12240 obs. of 4 variables: $ month: int 1 1 1 1 1 1 1 1 1 1 ... $ x : int 1 2 3 4 5 6 7 8 9 10 ... $ y : int 1 1 1 1 1 1 1 1 1 1 ... $ temp : num 0.673 -1.178 0.54 0.285 -1.153 ... (In the real world problem I had, data was monthly measurements of temperature and x, y was coordinates). Does anyone care to share a faster and less ugly solution? TIA -- Hans Ekbrand
Bert Gunter
2012-Feb-25 16:07 UTC
[R] which is the fastest way to make data.frame out of a three-dimensional array?
Cheat! Arrays are stored in column major order, so you can translate the indexing directly by: Assume dim(yourarray) = c(n1,n2,n3) *** warning: UNTESTED ** yourframe <- data.frame( dat = as.vector(yourarray) , dim1 = rep(seq_len(n1), n2*n3 ,dim2 = rep( rep(seq_len(n2), e=n1), n3) , dim3 = rep(seq_len(n3), e = n1*n2) ) Probably see also the reshape package for more elegant solutions. Cheers, Bert On Sat, Feb 25, 2012 at 7:54 AM, Hans Ekbrand <hans at sociologi.cjb.net> wrote:> foo <- rnorm(30*34*12) > dim(foo) <- c(30, 34, 12) > > I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame. > > I know how this can be done in a very slow way using for loops, like this: > > x <- rep(seq(from = 1, to = 30), 34) > y <- as.vector(sapply(1:34, function(x) {rep(x, 30)})) > month <- as.vector(sapply(1:12, function(x) {rep(x, 30*34)})) > my.df <- data.frame(month, x=rep(x, 12), y=rep(y, 12), temp=rep(NA, 30*34*12)) > my.counter <- 1 > for(month in 1:12){ > ?for(i in 1:34){ > ? ?for(j in 1:30){ > ? ? ?my.df$temp[my.counter] <- foo[j,i,month] > ? ? ?my.counter <- my.counter + 1 > ? ?} > ?} > } > > str(my.df) > 'data.frame': ? 12240 obs. of ?4 variables: > ?$ month: int ?1 1 1 1 1 1 1 1 1 1 ... > ?$ x ? ?: int ?1 2 3 4 5 6 7 8 9 10 ... > ?$ y ? ?: int ?1 1 1 1 1 1 1 1 1 1 ... > ?$ temp : num ?0.673 -1.178 0.54 0.285 -1.153 ... > > (In the real world problem I had, data was monthly measurements of temperature and x, y was coordinates). > > Does anyone care to share a faster and less ugly solution? > > TIA > > -- > Hans Ekbrand > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Petr Savicky
2012-Feb-25 17:40 UTC
[R] which is the fastest way to make data.frame out of a three-dimensional array?
On Sat, Feb 25, 2012 at 08:07:01AM -0800, Bert Gunter wrote:> Cheat! Arrays are stored in column major order, so you can translate > the indexing directly by: > > Assume dim(yourarray) = c(n1,n2,n3) > > *** warning: UNTESTED ** > > yourframe <- data.frame( dat = as.vector(yourarray) > , dim1 = rep(seq_len(n1), n2*n3 > ,dim2 = rep( rep(seq_len(n2), e=n1), n3) > , dim3 = rep(seq_len(n3), e = n1*n2) > )Hi. Try this df <- data.frame(dat=c(foo), which(foo == foo, arr.ind=TRUE)) This may be less efficient, but easier to remember. Hope this helps. Petr Savicky.
Petr Savicky
2012-Feb-25 17:55 UTC
[R] which is the fastest way to make data.frame out of a three-dimensional array?
On Sat, Feb 25, 2012 at 04:54:30PM +0100, Hans Ekbrand wrote:> foo <- rnorm(30*34*12) > dim(foo) <- c(30, 34, 12) > > I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame.Hi. Try this n1 <- dim(foo)[1] n2 <- dim(foo)[2] n3 <- dim(foo)[3] df <- cbind(dat=c(foo), expand.grid(dim1=1:n1, dim2=1:n2, dim3=1:n3)) df[1:5, ] dat dim1 dim2 dim3 1 -0.5765847 1 1 1 2 0.4490040 2 1 1 3 0.2626855 3 1 1 4 0.2206713 4 1 1 5 0.9079324 5 1 1 ... On the contrary to a previous suggestion with foo==foo, this works also in presence of NA. Hope this helps. Petr Savicky.
Bert Gunter
2012-Feb-25 19:09 UTC
[R] which is the fastest way to make data.frame out of a three-dimensional array?
Petr: Your expand.grid solution is clearly much better than my nonsense. It is just as fast (or faster) and is the far more sensible thing to do. For an array, ar, with dim(ar) = c(100,100,1000) , modifying your call slightly to: data.frame(c(ar),do.call(expand.grid,lapply(dim(ar),seq_len))) I got: user system elapsed 1.93 0.43 2.38 Using my call I got: user system elapsed 2.23 0.44 2.70 Thanks for the help. -- Bert On Sat, Feb 25, 2012 at 9:55 AM, Petr Savicky <savicky at cs.cas.cz> wrote:> On Sat, Feb 25, 2012 at 04:54:30PM +0100, Hans Ekbrand wrote: >> foo <- rnorm(30*34*12) >> dim(foo) <- c(30, 34, 12) >> >> I want to make a data.frame out of this three-dimensional array. Each dimension will be a variabel (column) in the data.frame. > > Hi. > > Try this > > ?n1 <- dim(foo)[1] > ?n2 <- dim(foo)[2] > ?n3 <- dim(foo)[3] > ?df <- cbind(dat=c(foo), expand.grid(dim1=1:n1, dim2=1:n2, dim3=1:n3)) > ?df[1:5, ] > > ? ? ? ? ? dat dim1 dim2 dim3 > ?1 -0.5765847 ? ?1 ? ?1 ? ?1 > ?2 ?0.4490040 ? ?2 ? ?1 ? ?1 > ?3 ?0.2626855 ? ?3 ? ?1 ? ?1 > ?4 ?0.2206713 ? ?4 ? ?1 ? ?1 > ?5 ?0.9079324 ? ?5 ? ?1 ? ?1 > ?... > > On the contrary to a previous suggestion with foo==foo, this > works also in presence of NA. > > Hope this helps. > > Petr Savicky. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hans Ekbrand
2012-Feb-25 19:23 UTC
[R] which is the fastest way to make data.frame out of a three-dimensional array?
First, thank you both Bert and Petr for your excellent answers. Berts solution seems somewhat faster, and Petrs is - in my opion at least - slightly more elegant.> foo <- rnorm(36 * 150 * 170) > dim(foo) <- c(36, 150, 170) > n <- dim(foo) > > system.time(my.df <- data.frame(dat = as.vector(foo),+ dim1 = rep(seq_len(n[1]), n[2]*n[3]), + dim2 = rep(rep(seq_len(n[2]), e=n[1]), n[3]), + dim3 = rep(seq_len(n[3]), e = n[1]*n[2]))) user system elapsed 0.932 0.156 1.090> > system.time(my.df <- cbind(temp=c(foo), expand.grid(dim1=1:n[1], dim2=1:n[2], dim3=1:n[3])))user system elapsed 0.980 0.252 1.244>-- Hans Ekbrand (http://sociologi.cjb.net) <hans at sociologi.cjb.net>