Marius Hofert
2013-Mar-11 20:52 UTC
[R] aggregate(), tapply(): Why is the order of the grouping variables not kept?
Dear expeRts, The question is rather simple: Why does aggregate (or similarly tapply()) not keep the order of the grouping variable(s)? Here is an example: x <- data.frame(group = rep(LETTERS[1:2], each=10), year = rep(rep(2001:2005, each=2), 2), value = rep(1:10, each=2)) ## => sorted according to group, then year aggregate(value ~ group + year, data=x, FUN=function(z) z[1]) ## => sorted according to year, then group I rather expected this to be the default: aggregate(value ~ year + group, data=x, FUN=function(z) z[1])[,c(2,1,3)] ## => same order as input (grouping) variables Same with tapply: as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=function(z) z[1]))) Cheers, Marius
Peter Ehlers
2013-Mar-11 23:59 UTC
[R] aggregate(), tapply(): Why is the order of the grouping variables not kept?
On 2013-03-11 13:52, Marius Hofert wrote:> Dear expeRts, > > The question is rather simple: Why does aggregate (or similarly tapply()) not keep the order of the grouping variable(s)? > > Here is an example: > > x <- data.frame(group = rep(LETTERS[1:2], each=10), > year = rep(rep(2001:2005, each=2), 2), > value = rep(1:10, each=2)) > ## => sorted according to group, then year > aggregate(value ~ group + year, data=x, FUN=function(z) z[1]) > ## => sorted according to year, then group > > I rather expected this to be the default: > > aggregate(value ~ year + group, data=x, FUN=function(z) z[1])[,c(2,1,3)] > ## => same order as input (grouping) variables > > Same with tapply: > > as.data.frame(as.table(tapply(x$value, list(x$group, x$year), FUN=function(z) z[1]))) > > > Cheers, > > MariusI'm no expeRt, but suppose that we change the setup slightly: xx <- x[sample(nrow(x)), ] Now what would you like aggregate(value ~ group + year, data=xx, FUN=function(z) z[1]) to return? Personally, I prefer to have R return the same thing regardless of how the input dataframe is sorted, i.e. the result should depend only on the formula. You just have to know that the order is to have the first factor vary most rapidly, then the next, etc. I think that's documented somewhere, but I don't know where. Peter Ehlers
Marius Hofert
2013-Mar-12 13:25 UTC
[R] aggregate(), tapply(): Why is the order of the grouping variables not kept?
> > I'm no expeRt, but suppose that we change the setup slightly: > > xx <- x[sample(nrow(x)), ] > > Now what would you like > > aggregate(value ~ group + year, data=xx, FUN=function(z) z[1]) > > to return? > > Personally, I prefer to have R return the same thing regardless > of how the input dataframe is sorted,Personally, I prefer to have R not to change my input as much as possible... but I totally agree with you that there are other instances where it's preferable that the output does not depend on the input.> i.e. the result should depend only on the formula. You just have to know that > the order is to have the first factor vary most rapidly,... which I still find very confusing/unnatural, but okay.> then the next, etc. I think that's documented somewhere, but I don't know > where.it's also the default behavior of expand.grid() for example. Cheers, Marius> > > Peter Ehlers >
Maybe Matching Threads
- How to convert the output of tapply() so that it has the same order as the input?
- Quiz: Who finds the nicest form of X_1^\prime?
- lattice splom: how to adjust space between tick marks and tick labels?
- How to 'extend' a data.frame based on given variable combinations ?
- lattice + plotmath: how to get a variable in bold face?