tom soyer
2007-Dec-16 14:23 UTC
[R] question about the aggregate function with respect to order of levels of grouping elements
Hi, I am using aggregate() to add up groups of data according to year and month. It seems that the function aggregate() automatically sorts the levels of factors of the grouping elements, even if the order of the levels of factors is supplied. I am wondering if this is a bug, or if I missed something important. Below is an example that shows what I mean. Does anyone know if this is just the way the aggregate function works, or are there ways to force aggregate() to keep the order of levels of factors supplied by the grouping elements? Thanks! library(chron) dts=seq.dates("1/1/01","12/31/03") rnum=rnorm(1:length(dts)) df=data.frame(date=dts,obs=rnum) agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum) levels(agg$month) # aggregate() automatically generates levels sorted by alphabet. [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep" fmonth=factor(months(df[,1])) levels(fmonth) # factor() automatically generates the correct order of levels. [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum) levels(agg2$month) # even if a factor with levels in the correct order is supplied, aggregate(), sortsthe levels by alphabet regardless. [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep" -- Tom [[alternative HTML version deleted]]
Gabor Grothendieck
2007-Dec-16 14:50 UTC
[R] question about the aggregate function with respect to order of levels of grouping elements
This does look strange. Note that aggregate.zoo in the zoo package would work here:> library(zoo) > aggregate(zoo(rnum, dts), as.yearmon, sum)Jan 2001 Feb 2001 Mar 2001 Apr 2001 May 2001 Jun 2001 4.43610085 0.49842227 7.52139932 1.47917343 10.64459923 -1.22530586 Jul 2001 Aug 2001 Sep 2001 Oct 2001 Nov 2001 Dec 2001 8.19563685 1.57626974 1.28842871 2.50540074 0.71156951 0.54118342 Jan 2002 Feb 2002 Mar 2002 Apr 2002 May 2002 Jun 2002 -0.41292840 -2.41301496 3.23783551 0.63914807 -1.46357402 2.91651492 Jul 2002 Aug 2002 Sep 2002 Oct 2002 Nov 2002 Dec 2002 2.17263290 -2.30981022 -9.60701788 1.16504368 -3.07038254 1.38281927 Jan 2003 Feb 2003 Mar 2003 Apr 2003 May 2003 Jun 2003 4.48761479 2.42455090 -0.03743888 1.11223001 -4.07988016 -1.15116293 Jul 2003 Aug 2003 Sep 2003 Oct 2003 Nov 2003 Dec 2003 -7.15292576 -2.34231702 -0.48132751 11.74252191 2.51063034 -4.35801058 On Dec 16, 2007 9:23 AM, tom soyer <tom.soyer at gmail.com> wrote:> Hi, > > I am using aggregate() to add up groups of data according to year and month. > It seems that the function aggregate() automatically sorts the levels of > factors of the grouping elements, even if the order of the levels of factors > is supplied. I am wondering if this is a bug, or if I missed something > important. Below is an example that shows what I mean. Does anyone know if > this is just the way the aggregate function works, or are there ways > to force aggregate() to keep the order of levels of factors supplied by the > grouping elements? Thanks! > > library(chron) > dts=seq.dates("1/1/01","12/31/03") > rnum=rnorm(1:length(dts)) > df=data.frame(date=dts,obs=rnum) > agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum) > levels(agg$month) # aggregate() automatically generates levels sorted by > alphabet. > > [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep" > > fmonth=factor(months(df[,1])) > levels(fmonth) # factor() automatically generates the correct order of > levels. > > [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" > > > agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum) > levels(agg2$month) # even if a factor with levels in the correct order is > supplied, aggregate(), sortsthe levels by alphabet regardless. > > [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep" > > > -- > Tom > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
jim holtman
2007-Dec-16 15:04 UTC
[R] question about the aggregate function with respect to order of levels of grouping elements
What version of R are you using? Here is the output I got with 2.6.1:> library(chron) > dts=seq.dates("1/1/01","12/31/03") > rnum=rnorm(1:length(dts)) > df=data.frame(date=dts,obs=rnum) > agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum) > levels(agg$month) # aggregate() automatically generates levels sorted by alphabet.[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"> > fmonth=factor(months(df[,1])) > levels(fmonth) # factor() automatically generates the correct order of levels.[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"> agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum) > levels(agg2$month) # even if a factor with levels in the correct order is supplied, aggregate(), sortsthe levels by alphabet regardless.[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"> >Order seems to be correct. On Dec 16, 2007 9:23 AM, tom soyer <tom.soyer at gmail.com> wrote:> Hi, > > I am using aggregate() to add up groups of data according to year and month. > It seems that the function aggregate() automatically sorts the levels of > factors of the grouping elements, even if the order of the levels of factors > is supplied. I am wondering if this is a bug, or if I missed something > important. Below is an example that shows what I mean. Does anyone know if > this is just the way the aggregate function works, or are there ways > to force aggregate() to keep the order of levels of factors supplied by the > grouping elements? Thanks! > > library(chron) > dts=seq.dates("1/1/01","12/31/03") > rnum=rnorm(1:length(dts)) > df=data.frame(date=dts,obs=rnum) > agg=aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum) > levels(agg$month) # aggregate() automatically generates levels sorted by > alphabet. > > [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep" > > fmonth=factor(months(df[,1])) > levels(fmonth) # factor() automatically generates the correct order of > levels. > > [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec" > > > agg2=aggregate(df[,2],list(year=years(df[,1]),month=fmonth),sum) > levels(agg2$month) # even if a factor with levels in the correct order is > supplied, aggregate(), sortsthe levels by alphabet regardless. > > [1] "Apr" "Aug" "Dec" "Feb" "Jan" "Jul" "Jun" "Mar" "May" "Nov" "Oct" "Sep" > > > -- > Tom > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?