franz.quehenberger at medunigraz.at
2010-Feb-12 12:50 UTC
[Rd] aggregate: with 2 by variables in the result the 2nd by-variable is wrong (PR#14213)
Full_Name: Franz Quehenberger Version: 2.10.1 OS: Windows XP Submission from: (NULL) (145.244.10.3) aggregate is supposed to produce a data.frame that contains a line for each combination of levels of the variables in the by list. The first columns of the result contain these combinations of levels. With two by variables the second by-variable takes always only one value. However, it works fine with one or three by-variables. The problems seems to be caused by this line of code in aggregate(): w <- as.data.frame(w, stringsAsFactors = FALSE)[which(!unlist(lapply(z, is.null))), , drop = FALSE] or more specifically by: [which(!unlist(lapply(z, is.null))), , drop = FALSE] Kind regards FQ # demonstration of the aggregate bug ind R 2.10.1 factor.a=rep(letters[1:3],4) factor.b=rep(letters[4:5],each=3,times=2) factor.c=rep(letters[4:5+2],each=6) data=data.frame(factor.a,factor.b,factor.c,x) x=1:12 #one by-variable works: aggregate(x,list(a=factor.a),FUN=mean) #thre by-variable work fine: aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean) #two by-variables do not produce the levels of the second by-variable correctly: aggregate(x,list(a=factor.a,b=factor.b),FUN=mean) # data print(data) ++++++++++++++++++++++++++++++++++++++++++++++++++++ Result of the R code: ++++++++++++++++++++++++++++++++++++++++++++++++++++> # demonstration of the aggregate bug ind R 2.10.1 > factor.a=rep(letters[1:3],4) > factor.b=rep(letters[4:5],each=3,times=2) > factor.c=rep(letters[4:5+2],each=6) > data=data.frame(factor.a,factor.b,factor.c,x) > x=1:12 > #one by-variable works: > aggregate(x,list(a=factor.a),FUN=mean)a x 1 a 5.5 2 b 6.5 3 c 7.5> #thre by-variable work fine: > aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean)a b c x 1 a d d 4 2 b d d 5 3 c d d 6 4 a e e 7 5 b e e 8 6 c e e 9> #two by-variables do not produce the levels of the second by-variablecorrectly:> aggregate(x,list(a=factor.a,b=factor.b),FUN=mean)a b x 1 a d 4 2 b d 5 3 c d 6 4 a d 7 5 b d 8 6 c d 9 Warnmeldung: In data.frame(w, lapply(y, unlist, use.names = FALSE), stringsAsFactors = FALSE) : row names were found from a short variable and have been discarded> # data > print(data)factor.a factor.b factor.c x 1 a d f 1 2 b d f 2 3 c d f 3 4 a e f 4 5 b e f 5 6 c e f 6 7 a d g 7 8 b d g 8 9 c d g 9 10 a e g 10 11 b e g 11 12 c e g 12>
Peter Ehlers
2010-Feb-12 21:01 UTC
[Rd] aggregate: with 2 by variables in the result the 2nd by-variable is wrong (PR#14213)
franz.quehenberger at medunigraz.at wrote:> Full_Name: Franz Quehenberger > Version: 2.10.1 > OS: Windows XP > Submission from: (NULL) (145.244.10.3) > > > aggregate is supposed to produce a data.frame that contains a line for each > combination of levels of the variables in the by list. The first columns of the > result contain these combinations of levels. With two by variables the second > by-variable takes always only one value. However, it works fine with one or > three by-variables. > > The problems seems to be caused by this line of code in aggregate(): > > w <- as.data.frame(w, stringsAsFactors = FALSE)[which(!unlist(lapply(z, > is.null))), , drop = FALSE] > > or more specifically by: > > [which(!unlist(lapply(z, is.null))), , drop = FALSE] > > Kind regards > FQ > > > > # demonstration of the aggregate bug ind R 2.10.1 > factor.a=rep(letters[1:3],4) > factor.b=rep(letters[4:5],each=3,times=2) > factor.c=rep(letters[4:5+2],each=6) > data=data.frame(factor.a,factor.b,factor.c,x) > x=1:12 > #one by-variable works: > aggregate(x,list(a=factor.a),FUN=mean) > #thre by-variable work fine: > aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean) > #two by-variables do not produce the levels of the second by-variable > correctly: > aggregate(x,list(a=factor.a,b=factor.b),FUN=mean) > # data > print(data) > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > Result of the R code: > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> # demonstration of the aggregate bug ind R 2.10.1 >> factor.a=rep(letters[1:3],4) >> factor.b=rep(letters[4:5],each=3,times=2) >> factor.c=rep(letters[4:5+2],each=6) >> data=data.frame(factor.a,factor.b,factor.c,x) >> x=1:12 >> #one by-variable works: >> aggregate(x,list(a=factor.a),FUN=mean) > a x > 1 a 5.5 > 2 b 6.5 > 3 c 7.5 >> #thre by-variable work fine: >> aggregate(x,list(a=factor.a,b=factor.b,c=factor.b),FUN=mean) > a b c x > 1 a d d 4 > 2 b d d 5 > 3 c d d 6 > 4 a e e 7 > 5 b e e 8 > 6 c e e 9 >> #two by-variables do not produce the levels of the second by-variable > correctly: >> aggregate(x,list(a=factor.a,b=factor.b),FUN=mean) > a b x > 1 a d 4 > 2 b d 5 > 3 c d 6 > 4 a d 7 > 5 b d 8 > 6 c d 9 > Warnmeldung: > In data.frame(w, lapply(y, unlist, use.names = FALSE), stringsAsFactors = FALSE) > : > row names were found from a short variable and have been discarded >> # data >> print(data) > factor.a factor.b factor.c x > 1 a d f 1 > 2 b d f 2 > 3 c d f 3 > 4 a e f 4 > 5 b e f 5 > 6 c e f 6 > 7 a d g 7 > 8 b d g 8 > 9 c d g 9 > 10 a e g 10 > 11 b e g 11 > 12 c e g 12 >I don't see this is 2.10.1 nor in 2.11.0 (Windows Vista). I can't think of how you might have got your result. Is there something you haven't mentioned? What's your sessionInfo()? -- Peter Ehlers University of Calgary