thr3ads.net - R help - [R] aggregate vs tapply; is there a middle ground? [Feb 2006]

If this information is useful, please help other people find it:
Share via:

Joseph LeBouton

2006-Feb-11 21:28 UTC

[R] aggregate vs tapply; is there a middle ground?

Dear all,

I'm wanting to do a series of comparisons among 4 categorical variables:

a <- aggregate(y, list(var1, var2, var3, var4), sum)

This gets me a very nice 2-dimensional data frame with one column per 
variable, BUT, as help for aggregate says, <<empty subsets are 
removed>>.  I don't see in help(aggregate) how I can change this.

In contrast,
a <- tapply(y, list(var1, var2, var3, var4), sum)

gives me results for everything including empty subsets, but in an 
awkward 4-dimensional array that takes me another 10 lines of 
inefficient code to turn into a 2D data.frame.

Is there a way to directly do this calculation INCLUDING results for 
empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR 
alternatively is there a simple way to mush the 4D result from the 
tapply into a 2D matrix/data.frame?

thanks very much in advance for any help!

-jlb

-- 
************************************
Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: lebouton at msu.edu

Hans Gardfjell

2006-Feb-11 22:24 UTC

head link

[R] aggregate vs tapply; is there a middle ground?

I faced a similar problem. Here's what I did

tmp <- 
data.frame(A=sample(LETTERS[1:5],10,replace=T),B=sample(letters[1:5],10,replace=T),C=rnorm(10))
tmp1 <- with(tmp,aggregate(C,list(A=A,B=B),sum))
tmp2 <- expand.grid(A=sort(unique(tmp$A)),B=sort(unique(tmp$B)))
merge(tmp2,tmp1,all.x=T)

At least fewer than 10 extra lines of code. Anyone with a simpler solution?

Cheers, Hans


lebouton wrote:>
>Dear all,
>
>I'm wanting to do a series of comparisons among 4 categorical variables:
>
>a <- aggregate(y, list(var1, var2, var3, var4), sum)
>
>This gets me a very nice 2-dimensional data frame with one column per 
>variable, BUT, as help for aggregate says, <<empty subsets are 
>removed>>.  I don't see in help(aggregate) how I can change this.
>
>In contrast,
>a <- tapply(y, list(var1, var2, var3, var4), sum)
>
>gives me results for everything including empty subsets, but in an 
>awkward 4-dimensional array that takes me another 10 lines of 
>inefficient code to turn into a 2D data.frame.
>
>Is there a way to directly do this calculation INCLUDING results for 
>empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR 
>alternatively is there a simple way to mush the 4D result from the 
>tapply into a 2D matrix/data.frame?
>
>thanks very much in advance for any help!
>
>-jlb
>
>-- 
>************************************
>Joseph P. LeBouton
>Forest Ecology PhD Candidate
>Department of Forestry
>Michigan State University
>East Lansing, Michigan 48824
>
>Office phone: 517-355-7744
>email: lebouton at msu.edu
<https://stat.ethz.ch/mailman/listinfo/r-help>

-- 

*********************************
Hans Gardfjell
Ecology and Environmental Science
Ume?? University
90187 Ume??, Sweden
email: hans.gardfjell at emg.umu.se
phone:  +46 907865267
mobile: +46 705984464

Joseph LeBouton

2006-Feb-11 22:45 UTC

head link

[R] aggregate vs tapply; is there a middle ground?

Thanks, Phil!  I've literally spent two hours on my own trying to find 
something that does exactly that.  Thanks for another pair of functions 
added to my (slowly!) growing R vocabulary.

-jlb

Phil Spector wrote:> Joseph -
>    I'm sure there are clearer and more efficient ways to do it, but 
> here's something
> that seems to do what you want:
> 
> z = tapply(y,list(var1,var2,var3,var4),sum)
>
data.frame(do.call('expand.grid',dimnames(z)),y=do.call('rbind',as.list(z)))
> 
> 
>                                        - Phil Spector
>                      Statistical Computing Facility
>                      Department of Statistics
>                      UC Berkeley
>                      spector at stat.berkeley.edu
> 
> 
> On Sat, 11 Feb 2006, Joseph LeBouton wrote:
> 
>> Dear all,
>>
>> I'm wanting to do a series of comparisons among 4 categorical
variables:
>>
>> a <- aggregate(y, list(var1, var2, var3, var4), sum)
>>
>> This gets me a very nice 2-dimensional data frame with one column per
>> variable, BUT, as help for aggregate says, <<empty subsets are
>> removed>>.  I don't see in help(aggregate) how I can change
this.
>>
>> In contrast,
>> a <- tapply(y, list(var1, var2, var3, var4), sum)
>>
>> gives me results for everything including empty subsets, but in an
>> awkward 4-dimensional array that takes me another 10 lines of
>> inefficient code to turn into a 2D data.frame.
>>
>> Is there a way to directly do this calculation INCLUDING results for
>> empty subsets, and still obtain a 2D array, matrix, or data.frame?  OR
>> alternatively is there a simple way to mush the 4D result from the
>> tapply into a 2D matrix/data.frame?
>>
>> thanks very much in advance for any help!
>>
>> -jlb
>>
>> -- 
>> ************************************
>> Joseph P. LeBouton
>> Forest Ecology PhD Candidate
>> Department of Forestry
>> Michigan State University
>> East Lansing, Michigan 48824
>>
>> Office phone: 517-355-7744
>> email: lebouton at msu.edu
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide! 
>> http://www.R-project.org/posting-guide.html
>>
> 
> 
-- 
************************************
Joseph P. LeBouton
Forest Ecology PhD Candidate
Department of Forestry
Michigan State University
East Lansing, Michigan 48824

Office phone: 517-355-7744
email: lebouton at msu.edu

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Feb 2006 - aggregate vs tapply; is there a middle ground?

[R] aggregate vs tapply; is there a middle ground?

[R] aggregate vs tapply; is there a middle ground?

[R] aggregate vs tapply; is there a middle ground?

Possibly Parallel Threads