Glen Barnett
2010-Jul-20 05:51 UTC
[R] best way to apply a list of functions to a dataset ?
Assuming I have a matrix of data (or under some restrictions that will become obvious, possibly a data frame), I want to be able to apply a list of functions (initially producing a single number from a vector) to the data and produce a data frame (for compact output) with column 1 being the function results for the first function, column 2 being the results for the second function and so on - with each row being the columns of the original data. The obvious application of this is to produce summaries of data sets (a bit like summary() does on numeric matrices), but with user supplied functions. I am content for the moment to leave it to the user to supply functions that work with the data they supply so as to produce results that will actually be data-frame-able, though I'd like to ultimately make it a bit nicer than it currently is without compromising the niceness of the output in the "good" cases. The example below is a simplistic approach to this problem (it should run as is). I have named it "fapply" for fairly obvious reasons, but added the ".1" because it doesn't accept multidimensional arrays. I have included the output I generated, which is what I want. There are some obvious generalizations (e.g. being able to include functions like range(), say, that produce several values on a vector, rather than one, making the user's life simpler when a function already does most of what they need). The question is: this looks like a silly approach, growing a list inside a for loop. Also I recall reading that if you find yourself using "do.call" you should probably be doing something else. So my question: Is there a better way to implement a function like this? Or, even better, is there already a function that does this? ## example function and code to apply a list of functions to a matrix (here a numeric data frame) library(datasets) fapply.1 <- function(x, fun.l, colnames=fun.l){ out.l <- list() ? # starts with an empty list for (i in seq_along(fun.l)) out.l[[i]] <- apply(x,2,fun.l[[i]]) ? # loop through list of functions # set up names and make into a data frame names(out.l) <- colnames attr(out.l,"row.names") <- names(out.l[[1]]) attr(out.l,"class") <- "data.frame" out.l } skewness <- function(x) mean(scale(x)^3) ? ? ?#define a simple numeric function mean.gt.med <- function(x) mean(x)>median(x) ?# define a simple non-numeric fn flist <- c("mean","sd","skewness","median","mean.gt.med") # make list of fns to apply fapply.1(attitude,flist) ? ? ? ? ? ? ? mean ? ? ? ?sd ? ?skewness median mean.gt.med rating ? ? 64.63333 12.172562 -0.35792491 ? 65.5 ? ? ? FALSE complaints 66.60000 13.314757 -0.21541749 ? 65.0 ? ? ? ?TRUE privileges 53.13333 12.235430 ?0.37912287 ? 51.5 ? ? ? ?TRUE learning ? 56.36667 11.737013 -0.05403354 ? 56.5 ? ? ? FALSE raises ? ? 64.63333 10.397226 ?0.19754317 ? 63.5 ? ? ? ?TRUE critical ? 74.76667 ?9.894908 -0.86577893 ? 77.5 ? ? ? FALSE advance ? ?42.93333 10.288706 ?0.85039799 ? 41.0 ? ? ? ?TRUE ## end code and output So did I miss something obvious? Any suggestions as far as style or simple stability-enhancing improvements would be handy. regards, Glen
Glen Barnett
2010-Jul-20 06:47 UTC
[R] best way to apply a list of functions to a dataset ?
Erk. Sorry about the wrapping issue on the comments in the code, which will interfere with a straight copypaste. Glen
Dennis Murphy
2010-Jul-20 08:55 UTC
[R] best way to apply a list of functions to a dataset ?
Hi: This might be a little easier (?): library(datasets) skewness <- function(x) mean(scale(x)^3) mean.gt.med <- function(x) mean(x)>median(x) # ------ # construct the function to apply to each variable in the data frame f <- function(x) c(mean = mean(x), sd = sd(x), skewness = skewness(x), median = median(x), mean.gt.med = mean.gt.med(x)) # map function to each variable with lapply and combine with do.call(): do.call(rbind, lapply(attitude, f)) mean sd skewness median mean.gt.med rating 64.63333 12.172562 -0.35792491 65.5 0 complaints 66.60000 13.314757 -0.21541749 65.0 1 privileges 53.13333 12.235430 0.37912287 51.5 1 learning 56.36667 11.737013 -0.05403354 56.5 0 raises 64.63333 10.397226 0.19754317 63.5 1 critical 74.76667 9.894908 -0.86577893 77.5 0 advance 42.93333 10.288706 0.85039799 41.0 1 HTH, Dennis On Mon, Jul 19, 2010 at 10:51 PM, Glen Barnett <glnbrntt@gmail.com> wrote:> Assuming I have a matrix of data (or under some restrictions that will > become obvious, possibly a data frame), I want to be able to apply a > list of functions (initially producing a single number from a vector) > to the data and produce a data frame (for compact output) with column > 1 being the function results for the first function, column 2 being > the results for the second function and so on - with each row being > the columns of the original data. > > The obvious application of this is to produce summaries of data sets > (a bit like summary() does on numeric matrices), but with user > supplied functions. I am content for the moment to leave it to the > user to supply functions that work with the data they supply so as to > produce results that will actually be data-frame-able, though I'd like > to ultimately make it a bit nicer than it currently is without > compromising the niceness of the output in the "good" cases. > > The example below is a simplistic approach to this problem (it should > run as is). I have named it "fapply" for fairly obvious reasons, but > added the ".1" because it doesn't accept multidimensional arrays. I > have included the output I generated, which is what I want. There are > some obvious generalizations (e.g. being able to include functions > like range(), say, that produce several values on a vector, rather > than one, making the user's life simpler when a function already does > most of what they need). > > The question is: this looks like a silly approach, growing a list > inside a for loop. Also I recall reading that if you find yourself > using "do.call" you should probably be doing something else. > > So my question: Is there a better way to implement a function like this? > > Or, even better, is there already a function that does this? > > ## example function and code to apply a list of functions to a matrix > (here a numeric data frame) > > library(datasets) > > fapply.1 <- function(x, fun.l, colnames=fun.l){ > out.l <- list() # starts with an empty list > for (i in seq_along(fun.l)) out.l[[i]] <- apply(x,2,fun.l[[i]]) # > loop through list of functions > > # set up names and make into a data frame > names(out.l) <- colnames > attr(out.l,"row.names") <- names(out.l[[1]]) > attr(out.l,"class") <- "data.frame" > out.l > } > > skewness <- function(x) mean(scale(x)^3) #define a simple numeric > function > mean.gt.med <- function(x) mean(x)>median(x) # define a simple non-numeric > fn > flist <- c("mean","sd","skewness","median","mean.gt.med") # make list > of fns to apply > > fapply.1(attitude,flist) > mean sd skewness median mean.gt.med > rating 64.63333 12.172562 -0.35792491 65.5 FALSE > complaints 66.60000 13.314757 -0.21541749 65.0 TRUE > privileges 53.13333 12.235430 0.37912287 51.5 TRUE > learning 56.36667 11.737013 -0.05403354 56.5 FALSE > raises 64.63333 10.397226 0.19754317 63.5 TRUE > critical 74.76667 9.894908 -0.86577893 77.5 FALSE > advance 42.93333 10.288706 0.85039799 41.0 TRUE > > ## end code and output > > So did I miss something obvious? > > Any suggestions as far as style or simple stability-enhancing > improvements would be handy. > > regards, > Glen > > ______________________________________________ > R-help@r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]