Dear all, I'm wanting to do a series of comparisons among 4 categorical variables: a <- aggregate(y, list(var1, var2, var3, var4), sum) This gets me a very nice 2-dimensional data frame with one column per variable, BUT, as help for aggregate says, <<empty subsets are removed>>. I don't see in help(aggregate) how I can change this. In contrast, a <- tapply(y, list(var1, var2, var3, var4), sum) gives me results for everything including empty subsets, but in an awkward 4-dimensional array that takes me another 10 lines of inefficient code to turn into a 2D data.frame. Is there a way to directly do this calculation INCLUDING results for empty subsets, and still obtain a 2D array, matrix, or data.frame? OR alternatively is there a simple way to mush the 4D result from the tapply into a 2D matrix/data.frame? thanks very much in advance for any help! -jlb -- ************************************ Joseph P. LeBouton Forest Ecology PhD Candidate Department of Forestry Michigan State University East Lansing, Michigan 48824 Office phone: 517-355-7744 email: lebouton at msu.edu
I faced a similar problem. Here's what I did tmp <- data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10)) tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum)) tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B))) merge(tmp2,tmp1,all.x=T) At least fewer than 10 extra lines of code. Anyone with a simpler solution? Cheers, Hans lebouton wrote:> >Dear all, > >I'm wanting to do a series of comparisons among 4 categorical variables: > >a <- aggregate(y, list(var1, var2, var3, var4), sum) > >This gets me a very nice 2-dimensional data frame with one column per >variable, BUT, as help for aggregate says, <<empty subsets are >removed>>. I don't see in help(aggregate) how I can change this. > >In contrast, >a <- tapply(y, list(var1, var2, var3, var4), sum) > >gives me results for everything including empty subsets, but in an >awkward 4-dimensional array that takes me another 10 lines of >inefficient code to turn into a 2D data.frame. > >Is there a way to directly do this calculation INCLUDING results for >empty subsets, and still obtain a 2D array, matrix, or data.frame? OR >alternatively is there a simple way to mush the 4D result from the >tapply into a 2D matrix/data.frame? > >thanks very much in advance for any help! > >-jlb > >-- >************************************ >Joseph P. LeBouton >Forest Ecology PhD Candidate >Department of Forestry >Michigan State University >East Lansing, Michigan 48824 > >Office phone: 517-355-7744 >email: lebouton at msu.edu <https://stat.ethz.ch/mailman/listinfo/r-help>-- ********************************* Hans Gardfjell Ecology and Environmental Science Ume?? University 90187 Ume??, Sweden email: hans.gardfjell at emg.umu.se phone: +46 907865267 mobile: +46 705984464
Thanks, Phil! I've literally spent two hours on my own trying to find something that does exactly that. Thanks for another pair of functions added to my (slowly!) growing R vocabulary. -jlb Phil Spector wrote:> Joseph - > I'm sure there are clearer and more efficient ways to do it, but > here's something > that seems to do what you want: > > z = tapply(y,list(var1,var2,var3,var4),sum) > data.frame(do.call('expand.grid',dimnames(z)),y=do.call('rbind',as.list(z))) > > > - Phil Spector > Statistical Computing Facility > Department of Statistics > UC Berkeley > spector at stat.berkeley.edu > > > On Sat, 11 Feb 2006, Joseph LeBouton wrote: > >> Dear all, >> >> I'm wanting to do a series of comparisons among 4 categorical variables: >> >> a <- aggregate(y, list(var1, var2, var3, var4), sum) >> >> This gets me a very nice 2-dimensional data frame with one column per >> variable, BUT, as help for aggregate says, <<empty subsets are >> removed>>. I don't see in help(aggregate) how I can change this. >> >> In contrast, >> a <- tapply(y, list(var1, var2, var3, var4), sum) >> >> gives me results for everything including empty subsets, but in an >> awkward 4-dimensional array that takes me another 10 lines of >> inefficient code to turn into a 2D data.frame. >> >> Is there a way to directly do this calculation INCLUDING results for >> empty subsets, and still obtain a 2D array, matrix, or data.frame? OR >> alternatively is there a simple way to mush the 4D result from the >> tapply into a 2D matrix/data.frame? >> >> thanks very much in advance for any help! >> >> -jlb >> >> -- >> ************************************ >> Joseph P. LeBouton >> Forest Ecology PhD Candidate >> Department of Forestry >> Michigan State University >> East Lansing, Michigan 48824 >> >> Office phone: 517-355-7744 >> email: lebouton at msu.edu >> >> ______________________________________________ >> R-help at stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide! >> http://www.R-project.org/posting-guide.html >> > >-- ************************************ Joseph P. LeBouton Forest Ecology PhD Candidate Department of Forestry Michigan State University East Lansing, Michigan 48824 Office phone: 517-355-7744 email: lebouton at msu.edu