prechelt at inf.fu-berlin.de
2007-Mar-21 15:18 UTC
[Rd] rbind.data.frame reacts on levels without factor (PR#9578)
Full_Name: Lutz Prechelt Version: 2.4.1 OS: Windows XP Submission from: (NULL) (160.45.111.67) I stack a number of data.frames using rbind. Each of these dataframes has a column 'authorname', which is a factor and a column author = unclass(authorname) as piecewise pseudonyms. When using rbind to stack these dataframes, R warns about invalid factor levels and inserts all NAs in the author column. The reason appears to be that rbind.data.frame looks for the presence of levels, not actually for class==factor when deciding what to handle as a factor: if (!is.null(levels(xj))) { I find this behavior surprising, hence dangerous, and it is not documented. Rather, the documentation says: "The 'rbind' data frame method takes the classes of the columns from the first data frame, and matches columns by name (rather than by position). Factors have their levels expanded as necessary [...]" The behavior has bitten me fairly hard, because I searched for the origin of the warning in all the wrong places before finding the real one after about 3 hours. (Although I still have not understood _why_ it results in that warning.) I believe the behavior of rbind.data.frame should be fixed, so that it ignores levels attributes when there is no factor class as well. The alternative would be to just add a warning to the documentation that 'unclass' on factors is insufficient if users want to avoid factor handling for rbind.
ripley at stats.ox.ac.uk
2007-Mar-21 21:13 UTC
[Rd] rbind.data.frame reacts on levels without factor (PR#9578)
There is no example to reproduce here. Please do show the courtesy to follow the request at the bottom of every R-help posting and many other places and provide some reproducible evidence to support your points. On Wed, 21 Mar 2007, prechelt at inf.fu-berlin.de wrote:> Full_Name: Lutz Prechelt > Version: 2.4.1 > OS: Windows XP > Submission from: (NULL) (160.45.111.67) > > > I stack a number of data.frames using rbind. > Each of these dataframes has a column 'authorname', which is a factor > and a column author = unclass(authorname) as piecewise pseudonyms. > When using rbind to stack these dataframes, R warns about invalid factor levels > and inserts all NAs in the author column. > > The reason appears to be that rbind.data.frame looks for the presence of levels, > not actually for class==factor when deciding what to handle as a factor: > if (!is.null(levels(xj))) { > > I find this behavior surprising, hence dangerous, and it is not documented. > Rather, the documentation says: > "The 'rbind' data frame method takes the classes of the columns > from the first data frame, and matches columns by name (rather > than by position). Factors have their levels expanded as > necessary [...]" > > The behavior has bitten me fairly hard, because I searched for the origin of the > warning in all the wrong places before finding the real one after about 3 > hours. > (Although I still have not understood _why_ it results in that warning.) > > I believe the behavior of rbind.data.frame should be fixed, so that it ignores > levels attributes when there is no factor class as well. > > The alternative would be to just add a warning to the documentation that > 'unclass' on factors is insufficient if users want to avoid factor handling for > rbind. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595