Albrecht, Dr. Stefan (AZ Private Equity Partner)
2007-Apr-19 15:39 UTC
[R] rbind() of factors in data.frame
Dear all, I would like to inquire, if it is a desired feature that the combination with rbind() of two data frames with factors columns does not sort the factors levels of the combined data frame.> str(rbind(data.frame(a = factor(c(4, 3))), data.frame(a = factor(c(2, 1)))))'data.frame': 4 obs. of 1 variable: $ a: Factor w/ 4 levels "3","4","1","2": 2 1 4 3 I would expect the combined factor levels to be sorted, as long as both factors are not ordered. With many thanks and best regards, Stefan ____________________________________ Dr. Stefan Albrecht, CFA Allianz Private Equity Partners GmbH Königinstr. 19 | 80539 Munich | Germany Phone: +49.(0)89.3800.18317 Fax: +49.(0)89.3800.818317 EMail: stefan.albrecht@allianz.com <mailto:stefan.albrecht@allianz.com> Web: www.apep.com <http://www.apep.com/> Allianz Private Equity Partners GmbH | Geschäftsführung: Wan Ching Ang, Karl Ralf Jung Sitz der Gesellschaft: München | Registergericht: München HRB 126221 | Ust-ID-Nr.: DE 813 264 786 [[alternative HTML version deleted]]
Please, no. It is already annoying enough that levels are sorted when creating a factor. Don't compound it by extending this to other functions. In concept the order of the levels of a factor is irrelevant (although in practice it makes a big difference, e.g. when plotting). If so, then why is alphabetic order preferred over any other? Why not leave them in the order the user provided? Rich Raubertas Merck & Co.> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Albrecht, Dr. Stefan (AZ Private Equity Partner) > Sent: Thursday, April 19, 2007 11:39 AM > To: r-help at stat.math.ethz.ch > Subject: [R] rbind() of factors in data.frame [Broadcast] > > Dear all, > > I would like to inquire, if it is a desired feature that the > combination with rbind() of two data frames with factors > columns does not sort the factors levels of the combined data frame. > > > str(rbind(data.frame(a = factor(c(4, 3))), data.frame(a = > factor(c(2, 1))))) > 'data.frame': 4 obs. of 1 variable: > $ a: Factor w/ 4 levels "3","4","1","2": 2 1 4 3 > > I would expect the combined factor levels to be sorted, as > long as both factors are not ordered. > > With many thanks and best regards, > Stefan > > > ____________________________________ > Dr. Stefan Albrecht, CFA > Allianz Private Equity Partners GmbH > K?niginstr. 19 | 80539 Munich | Germany > > Phone: +49.(0)89.3800.18317 > Fax: +49.(0)89.3800.818317 > EMail: stefan.albrecht at allianz.com > <mailto:stefan.albrecht at allianz.com> > Web: www.apep.com <http://www.apep.com/> > > > Allianz Private Equity Partners GmbH | Gesch?ftsf?hrung: Wan > Ching Ang, Karl Ralf Jung > Sitz der Gesellschaft: M?nchen | Registergericht: M?nchen HRB > 126221 | Ust-ID-Nr.: DE 813 264 786 > > > > [[alternative HTML version deleted]] > >------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}}
On Thu, 19 Apr 2007, Albrecht, Dr. Stefan (AZ Private Equity Partner) wrote:> I would like to inquire, if it is a desired feature that the combination > with rbind() of two data frames with factors columns does not sort the > factors levels of the combined data frame.Yes, and a documented one. To wit, the help file says Factors have their levels expanded as necessary (in the order of the levels of the levelsets of the factors encountered) and the result is an ordered factor if and only if all the components were ordered factors. (The last point differs from S-PLUS.)>> str(rbind(data.frame(a = factor(c(4, 3))), data.frame(a = factor(c(2, 1))))) > 'data.frame': 4 obs. of 1 variable: > $ a: Factor w/ 4 levels "3","4","1","2": 2 1 4 3 > > I would expect the combined factor levels to be sorted, as long as both > factors are not ordered.I would find that very undesirable: if the order matters at all, it seems rare that alphabetic (which is highly locale dependent) is optimal. In any case, if you rbind factors with the same levelset (perhaps the only really sensible usage), you do not want the result to have a different levelset. [And why would _you_ expect it to do something other than the help page says?] -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595