Mike Lawrence
2007-Jul-13 16:29 UTC
[Rd] Suggestion to extend aggregate() to return multiple and/or named values
Hi all, This is my first post to the developers list. As I understand it, aggregate() currently repeats a function across cells in a dataframe but is only able to handle functions with single value returns. Aggregate() also lacks the ability to retain the names given to the returned value. I've created an agg() function (pasted below) that is apparently backwards compatible (i.e. returns identical results as aggregate() if the function returns a single unnamed value), but is able to handle named and/or multiple return values. The code may be a little inefficient (there must be an easier way to set up the 'temp' data frame than to call aggregate and remove the final column), but I'm suggesting that something similar to this may be profitably used to replace aggregate entirely. #modified aggregate command, allowing for multiple/named output values agg=function(z,Ind,FUN,...){ FUN.out=by(z,Ind,FUN,...) num.cells=length(FUN.out) num.dv=length(FUN.out[[1]]) temp=aggregate(z,Ind,length) #dummy data frame temp=temp[,c(1:(length(temp)-1))] #remove last column from dummy frame for(i in 1:num.dv){ temp=cbind(temp,NA) n=names(FUN.out[[1]])[i] names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse(i==1,'x',paste ('x',i,sep=''))) for(j in 1:num.cells){ temp[j,length(temp)]=FUN.out[[j]][i] } } return(temp) } #create some factored data z=rnorm(100) # the DV A=rep(1:2,each=25,2) #one factor B=rep(1:2,each=50) #another factor Ind=list(A=A,B=B) #the factor list aggregate(z,Ind,mean) #show the means of each cell agg(z,Ind,mean) #should be identical to aggregate aggregate(z,Ind,summary) #returns an error agg(z,Ind,summary) #returns named columns #Make a function that returns multiple unnamed values summary2=function(x){ s=summary(x) names(s)=NULL return(s) } agg(z,Ind,summary2) #returns multiple columns, default names -- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public "The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less." - Piet Hein
Gabor Grothendieck
2007-Jul-13 17:04 UTC
[Rd] Suggestion to extend aggregate() to return multiple and/or named values
Note that summaryBy in the doBy package can also do that. library(doBy) DF <- data.frame(z, A = Ind$A, B = Ind$B) summaryBy(z ~ A + B, DF, FUN = summary) summaryBy(z ~ A + B, DF, FUN = summary2) On 7/13/07, Mike Lawrence <Mike.Lawrence at dal.ca> wrote:> Hi all, > > This is my first post to the developers list. As I understand it, > aggregate() currently repeats a function across cells in a dataframe > but is only able to handle functions with single value returns. > Aggregate() also lacks the ability to retain the names given to the > returned value. I've created an agg() function (pasted below) that is > apparently backwards compatible (i.e. returns identical results as > aggregate() if the function returns a single unnamed value), but is > able to handle named and/or multiple return values. The code may be a > little inefficient (there must be an easier way to set up the 'temp' > data frame than to call aggregate and remove the final column), but > I'm suggesting that something similar to this may be profitably used > to replace aggregate entirely. > > #modified aggregate command, allowing for multiple/named output values > agg=function(z,Ind,FUN,...){ > FUN.out=by(z,Ind,FUN,...) > num.cells=length(FUN.out) > num.dv=length(FUN.out[[1]]) > > temp=aggregate(z,Ind,length) #dummy data frame > temp=temp[,c(1:(length(temp)-1))] #remove last column from dummy frame > > for(i in 1:num.dv){ > temp=cbind(temp,NA) > n=names(FUN.out[[1]])[i] > names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse(i==1,'x',paste > ('x',i,sep=''))) > for(j in 1:num.cells){ > temp[j,length(temp)]=FUN.out[[j]][i] > } > } > return(temp) > } > > #create some factored data > z=rnorm(100) # the DV > A=rep(1:2,each=25,2) #one factor > B=rep(1:2,each=50) #another factor > Ind=list(A=A,B=B) #the factor list > > aggregate(z,Ind,mean) #show the means of each cell > agg(z,Ind,mean) #should be identical to aggregate > > aggregate(z,Ind,summary) #returns an error > agg(z,Ind,summary) #returns named columns > > #Make a function that returns multiple unnamed values > summary2=function(x){ > s=summary(x) > names(s)=NULL > return(s) > } > agg(z,Ind,summary2) #returns multiple columns, default names > > > -- > Mike Lawrence > Graduate Student, Department of Psychology, Dalhousie University > > Website: http://memetic.ca > > Public calendar: http://icalx.com/public/informavore/Public > > "The road to wisdom? Well, it's plain and simple to express: > Err and err and err again, but less and less and less." > - Piet Hein > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Mike Lawrence
2007-Jul-13 17:46 UTC
[Rd] Suggestion to extend aggregate() to return multiple and/or named values
bugfix already :P prior version fails when there is only one factor in Ind. This version also might be faster as I avoid using aggregate to create the dummy frame. agg=function(z,Ind,FUN,...){ FUN.out=by(z,Ind,FUN,...) num.cells=length(FUN.out) num.values=length(FUN.out[[1]]) for(i in 1:length(Ind)){ Ind[[i]]=unique(Ind[[i]]) } temp=expand.grid(Ind) for(i in 1:num.values){ temp$new=NA n=names(FUN.out[[1]])[i] names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse(i==1,'x',paste ('x',i,sep=''))) for(j in 1:num.cells){ temp[j,length(temp)]=FUN.out[[j]][i] } } return(temp) } On 13-Jul-07, at 1:29 PM, Mike Lawrence wrote:> Hi all, > > This is my first post to the developers list. As I understand it, > aggregate() currently repeats a function across cells in a > dataframe but is only able to handle functions with single value > returns. Aggregate() also lacks the ability to retain the names > given to the returned value. I've created an agg() function (pasted > below) that is apparently backwards compatible (i.e. returns > identical results as aggregate() if the function returns a single > unnamed value), but is able to handle named and/or multiple return > values. The code may be a little inefficient (there must be an > easier way to set up the 'temp' data frame than to call aggregate > and remove the final column), but I'm suggesting that something > similar to this may be profitably used to replace aggregate entirely. > > #modified aggregate command, allowing for multiple/named output values > agg=function(z,Ind,FUN,...){ > FUN.out=by(z,Ind,FUN,...) > num.cells=length(FUN.out) > num.dv=length(FUN.out[[1]]) > > temp=aggregate(z,Ind,length) #dummy data frame > temp=temp[,c(1:(length(temp)-1))] #remove last column from dummy > frame > > for(i in 1:num.dv){ > temp=cbind(temp,NA) > n=names(FUN.out[[1]])[i] > names(temp)[length(temp)]=ifelse(!is.null(n),n,ifelse > (i==1,'x',paste('x',i,sep=''))) > for(j in 1:num.cells){ > temp[j,length(temp)]=FUN.out[[j]][i] > } > } > return(temp) > } > > #create some factored data > z=rnorm(100) # the DV > A=rep(1:2,each=25,2) #one factor > B=rep(1:2,each=50) #another factor > Ind=list(A=A,B=B) #the factor list > > aggregate(z,Ind,mean) #show the means of each cell > agg(z,Ind,mean) #should be identical to aggregate > > aggregate(z,Ind,summary) #returns an error > agg(z,Ind,summary) #returns named columns > > #Make a function that returns multiple unnamed values > summary2=function(x){ > s=summary(x) > names(s)=NULL > return(s) > } > agg(z,Ind,summary2) #returns multiple columns, default names > > > -- > Mike Lawrence > Graduate Student, Department of Psychology, Dalhousie University > > Website: http://memetic.ca > > Public calendar: http://icalx.com/public/informavore/Public > > "The road to wisdom? Well, it's plain and simple to express: > Err and err and err again, but less and less and less." > - Piet Hein > >-- Mike Lawrence Graduate Student, Department of Psychology, Dalhousie University Website: http://memetic.ca Public calendar: http://icalx.com/public/informavore/Public "The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less." - Piet Hein