David Kane <David Kane
2002-May-16 19:24 UTC
[R] Using merge can convert character variables to factor
I am not sure if this is a bug or a feature, but I could not find it documented. In certain circumstances, using merge can convert a character variable to factor. Consider a simple example:> x <- data.frame(a = 1:4) > y <- data.frame(b = LETTERS[1:3]) > z <- merge(x, y, by = 0) > unlist(lapply(z, data.class))Row.names a b "factor" "integer" "factor" So far, so good. b should be a factor since it is a factor in y.> is.factor(y$b)[1] TRUE Changing b to be a charcter variable works as well.> y$b <- as.character(y$b) > z <- merge(x, y, by = 0) > unlist(lapply(z, data.class))Row.names a b "factor" "integer" "character" But when we change the merge to include all of the x rows in the resulting dataframe, we get:> z <- merge(x, y, by = 0, all.x = TRUE) > unlist(lapply(z, data.class))Row.names a b "factor" "integer" "factor">I think that b should still be a character variable in this case. If this is a bug, please let me know and I would be happy to submit it.> R.version_ platform sparc-sun-solaris2.6 arch sparc os solaris2.6 system sparc, solaris2.6 status major 1 minor 5.0 year 2002 month 04 day 29 language R Thanks, Dave Kane -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
ripley@stats.ox.ac.uk
2002-May-16 19:56 UTC
[R] Using merge can convert character variables to factor
On Thu, 16 May 2002, David Kane <David Kane wrote:> I am not sure if this is a bug or a feature, but I could not find it > documented. In certain circumstances, using merge can convert a character > variable to factor. Consider a simple example: > > > x <- data.frame(a = 1:4) > > y <- data.frame(b = LETTERS[1:3]) > > z <- merge(x, y, by = 0) > > unlist(lapply(z, data.class)) > Row.names a b > "factor" "integer" "factor" > > So far, so good. b should be a factor since it is a factor in y. > > > is.factor(y$b) > [1] TRUE > > Changing b to be a charcter variable works as well. > > > y$b <- as.character(y$b) > > z <- merge(x, y, by = 0) > > unlist(lapply(z, data.class)) > Row.names a b > "factor" "integer" "character" > > But when we change the merge to include all of the x rows in the resulting > dataframe, we get: > > > z <- merge(x, y, by = 0, all.x = TRUE) > > unlist(lapply(z, data.class)) > Row.names a b > "factor" "integer" "factor"sapply would be better, BTW.> I think that b should still be a character variable in this case.Well, b is character in x and factor in y. Your example is inconsistent: perhaps it should complain at you? -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._