Alain Guillet
2018-Feb-06 08:20 UTC
[R] Aggregate behaviour inconsistent (?) when FUN=table
Dear R users, When I use aggregate with table as FUN, I get what I would call a strange behaviour if it involves numerical vectors and one "level" of it is not present for every "levels" of the "by" variable: --------------------------- > df <- data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1)) > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) ? Group.1 A.0 A.1??? B 1?????? 0?? 1?? 2??? 3 2?????? 1?? 3?? 2 2, 3 > table(df$C,df$B) ??? 0 1 ? 0 3 0 ? 1 2 3 --------------- As you can see, a comma appears in the column with the variable B in the aggregate whereas when I call table I obtain the same result as if B was defined as a factor (I suppose it comes from the fact "non-factor arguments a are coerced via factor" according to the details of the table help). I find it completely normal if I remember that aggregate first splits the data into subsets and then compute the table. But then I don't understand why it works differently with character vectors. Indeed if I use character vectors, I get the same result as with factors: ------------------------ > df <- data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1"))) > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) ? Group.1 A.0 A.1 B.0 B.1 1?????? 0?? 1?? 2?? 3?? 0 2?????? 1?? 3?? 2?? 2?? 3 > df <- data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1))) > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) ? Group.1 A.0 A.1 B.0 B.1 1?????? 0?? 1?? 2?? 3?? 0 2?????? 1?? 3?? 2?? 2?? 3 --------------------- Is it possible to precise anything about this behaviour in the aggregate help since the result is not completely compatible with the expectation of result we can have according to the table help? Or would it be possible to have the same results independently of the vector type? This post was rejected on the R-devel mailing list so I ask my question here as suggested. Best regards, Alain Guillet -- Alain Guillet Statistician and Computer Scientist SMCS - IMMAQ - Universit? catholique de Louvain http://www.uclouvain.be/smcs Bureau c.316 Voie du Roman Pays, 20 (bte L1.04.01) B-1348 Louvain-la-Neuve Belgium Tel: +32 10 47 30 50 Acc?s: http://www.uclouvain.be/323631.html
Jeff Newmiller
2018-Feb-06 15:33 UTC
[R] Aggregate behaviour inconsistent (?) when FUN=table
The normal input to a factory that builds cars is car parts. Feeding whole trucks into such a factory is likely to yield odd-looking results. Both aggregate and table do similar kinds of things, but yield differently constructed outputs. The output of the table function is not well-suited to be used as the aggregated value to be compiled into a data frame by the aggregate function, so having aggregate call the table function will yield surprises. I am having some difficulty deciphering what it is you are trying to accomplish with all this, so I will guess that you are trying to reproduce the information output from table( df$C, df$B ) so aggregate( df$A, df[ , c( "C", "B" ) ], length ) but if that isn't what you want then perhaps you can clarify what result you want to see and we can help you get there. -- Sent from my phone. Please excuse my brevity. On February 6, 2018 12:20:03 AM PST, Alain Guillet <alain.guillet at uclouvain.be> wrote:>Dear R users, > >When I use aggregate with table as FUN, I get what I would call a >strange behaviour if it involves numerical vectors and one "level" of >it >is not present for every "levels" of the "by" variable: > >--------------------------- > > > df <- >data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1)) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > ? Group.1 A.0 A.1??? B >1?????? 0?? 1?? 2??? 3 >2?????? 1?? 3?? 2 2, 3 > > > table(df$C,df$B) > > ??? 0 1 > ? 0 3 0 > ? 1 2 3 > >--------------- > >As you can see, a comma appears in the column with the variable B in >the >aggregate whereas when I call table I obtain the same result as if B >was >defined as a factor (I suppose it comes from the fact "non-factor >arguments a are coerced via factor" according to the details of the >table help). I find it completely normal if I remember that aggregate >first splits the data into subsets and then compute the table. But then > >I don't understand why it works differently with character vectors. >Indeed if I use character vectors, I get the same result as with >factors: > >------------------------ > > > df <- >data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1"))) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > ? Group.1 A.0 A.1 B.0 B.1 >1?????? 0?? 1?? 2?? 3?? 0 >2?????? 1?? 3?? 2?? 2?? 3 > > > df <- >data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1))) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > ? Group.1 A.0 A.1 B.0 B.1 >1?????? 0?? 1?? 2?? 3?? 0 >2?????? 1?? 3?? 2?? 2?? 3 > >--------------------- > >Is it possible to precise anything about this behaviour in the >aggregate >help since the result is not completely compatible with the expectation > >of result we can have according to the table help? Or would it be >possible to have the same results independently of the vector type? >This >post was rejected on the R-devel mailing list so I ask my question here > >as suggested. > > >Best regards, >Alain Guillet
William Dunlap
2018-Feb-06 17:07 UTC
[R] Aggregate behaviour inconsistent (?) when FUN=table
Don't use aggregate's simplify=TRUE when FUN() produces return values of various dimensions. In your case, the shape of table(subset)'s return value depends on the number of levels in the factor 'subset'. If you make B a factor before splitting it by C, each split will have the same number of levels (2). If you split it and then let table convert each split to a factor, one split will have 1 level and the other 2. To see the details of the output , use str() instead of print(). Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet <alain.guillet at uclouvain.be> wrote:> Dear R users, > > When I use aggregate with table as FUN, I get what I would call a strange > behaviour if it involves numerical vectors and one "level" of it is not > present for every "levels" of the "by" variable: > > --------------------------- > > > df <- data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0 > ,1,0,0,1,1,1)) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > Group.1 A.0 A.1 B > 1 0 1 2 3 > 2 1 3 2 2, 3 > > > table(df$C,df$B) > > 0 1 > 0 3 0 > 1 2 3 > > --------------- > > As you can see, a comma appears in the column with the variable B in the > aggregate whereas when I call table I obtain the same result as if B was > defined as a factor (I suppose it comes from the fact "non-factor arguments > a are coerced via factor" according to the details of the table help). I > find it completely normal if I remember that aggregate first splits the > data into subsets and then compute the table. But then I don't understand > why it works differently with character vectors. Indeed if I use character > vectors, I get the same result as with factors: > > ------------------------ > > > df <- data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=fa > ctor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0", > "1","0","0","1","1","1"))) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > Group.1 A.0 A.1 B.0 B.1 > 1 0 1 2 3 0 > 2 1 3 2 2 3 > > > df <- data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0 > ,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1))) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > Group.1 A.0 A.1 B.0 B.1 > 1 0 1 2 3 0 > 2 1 3 2 2 3 > > --------------------- > > Is it possible to precise anything about this behaviour in the aggregate > help since the result is not completely compatible with the expectation of > result we can have according to the table help? Or would it be possible to > have the same results independently of the vector type? This post was > rejected on the R-devel mailing list so I ask my question here as suggested. > > > Best regards, > Alain Guillet > > -- > Alain Guillet > Statistician and Computer Scientist > > SMCS - IMMAQ - Universit? catholique de Louvain > http://www.uclouvain.be/smcs > > Bureau c.316 > Voie du Roman Pays, 20 (bte L1.04.01) > B-1348 Louvain-la-Neuve > Belgium > > Tel: +32 10 47 30 50 > > Acc?s: http://www.uclouvain.be/323631.html > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Alain Guillet
2018-Feb-06 17:17 UTC
[R] Aggregate behaviour inconsistent (?) when FUN=table
Thank you for your response. Note that with R 3.4.3, I get the same result with simplify=TRUE or simplify=FALSE. My problem was the behaviour was different if I define my columns as character or as numeric but for now some minutes I discovered there also is a stringsAsFactors option in the function data.frame. So yes, it was a stupid question and I apologize for it. On 06/02/2018 18:07, William Dunlap wrote:> Don't use aggregate's simplify=TRUE when FUN() produces return > values of various dimensions.? In your case, the shape of table(subset)'s > return value depends on the number of levels in the factor 'subset'. > If you make B a factor before splitting it by C, each split will have the > same number of levels (2).? If you split it and then let table convert > each split to a factor, one split will have 1 level and the other 2.? > To see > the details of the output , use str() instead of print(). > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com> > > On Tue, Feb 6, 2018 at 12:20 AM, Alain Guillet > <alain.guillet at uclouvain.be <mailto:alain.guillet at uclouvain.be>> wrote: > > Dear R users, > > When I use aggregate with table as FUN, I get what I would call a > strange behaviour if it involves numerical vectors and one "level" > of it is not present for every "levels" of the "by" variable: > > --------------------------- > > > df <- > data.frame(A=c(1,1,1,1,0,0,0,0),B=c(1,0,1,0,0,0,1,0),C=c(1,0,1,0,0,1,1,1)) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > ? Group.1 A.0 A.1??? B > 1?????? 0?? 1?? 2??? 3 > 2?????? 1?? 3?? 2 2, 3 > > > table(df$C,df$B) > > ??? 0 1 > ? 0 3 0 > ? 1 2 3 > > --------------- > > As you can see, a comma appears in the column with the variable B > in the aggregate whereas when I call table I obtain the same > result as if B was defined as a factor (I suppose it comes from > the fact "non-factor arguments a are coerced via factor" according > to the details of the table help). I find it completely normal if > I remember that aggregate first splits the data into subsets and > then compute the table. But then I don't understand why it works > differently with character vectors. Indeed if I use character > vectors, I get the same result as with factors: > > ------------------------ > > > df <- > data.frame(A=factor(c("1","1","1","1","0","0","0","0")),B=factor(c("1","0","1","0","0","0","1","0")),C=factor(c("1","0","1","0","0","1","1","1"))) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > ? Group.1 A.0 A.1 B.0 B.1 > 1?????? 0?? 1?? 2?? 3?? 0 > 2?????? 1?? 3?? 2?? 2?? 3 > > > df <- > data.frame(A=factor(c(1,1,1,1,0,0,0,0)),B=factor(c(1,0,1,0,0,0,1,0)),C=factor(c(1,0,1,0,0,1,1,1))) > > aggregate(df[1:2],list(df$C),table,simplify = TRUE,drop=TRUE) > ? Group.1 A.0 A.1 B.0 B.1 > 1?????? 0?? 1?? 2?? 3?? 0 > 2?????? 1?? 3?? 2?? 2?? 3 > > --------------------- > > Is it possible to precise anything about this behaviour in the > aggregate help since the result is not completely compatible with > the expectation of result we can have according to the table help? > Or would it be possible to have the same results independently of > the vector type? This post was rejected on the R-devel mailing > list so I ask my question here as suggested. > > > Best regards, > Alain Guillet > > -- >