Dear All, on rbind:ing together a number of data.frames, I found that character variables are converted into factors. Since this occurred for a data identifier, it was a little inconvenient and, to me, unexpected. (The help page explains the general procedure used. I also found that on forming a data frame, character variables are converted to factors. The help page on read.table has the 'as.is' argument, which I suppose kind of suggests that character variables tend to get converted into factors. Is there such a "preference" for factors and should this behaviour be expected? Example code d1 <- data.frame(id =letters[1:20], x = runif(20)) d2 <- data.frame(id =paste(letters[1:20],letters[1:20], sep = ""), x rexp(20)) d3 <- rbind(d1, d2) str(d1) # <- id is factor str(d2) # <- id is factor str(d3) # <- id is factor d1[["id"]] <- as.character(d1[["id"]]) d2[["id"]] <- as.character(d2[["id"]]) d3 <- rbind(d1, d2) str(d1) # <- id is character str(d2) # <- id is character str(d3) # <- id is factor Regards, Markus -- Markus J?ntti <markus.jantti at iki.fi> Statistics Finland
Prof Brian D Ripley
2003-Feb-12 16:14 UTC
[R] rbind.data.frame: character comverted to factor
Read ?data.frame: that tells you to use d1 <- data.frame(id =I(letters[1:20]), x = runif(20)) d2 <- data.frame(id =I(paste(letters[1:20],letters[1:20], sep = "")), x =rexp(20)) d3 <- rbind(d1, d2) which of course works! On 12 Feb 2003, Markus [ISO-8859-1] Jäntti wrote:> Dear All, > > on rbind:ing together a number of data.frames, I found that > character variables are converted into factors. Since this > occurred for a data identifier, it was a little inconvenient > and, to me, unexpected. (The help page explains the > general procedure used. I also found that on forming > a data frame, character variables are converted to factors.as documented in many places, including ?data.frame.> > The help page on read.table has the 'as.is' argument, which > I suppose kind of suggests that character variables tend to > get converted into factors. Is there such a "preference" for > factors and should this behaviour be expected?It's as documented.> Example code > > d1 <- data.frame(id =letters[1:20], x = runif(20)) > d2 <- data.frame(id =paste(letters[1:20],letters[1:20], sep = ""), x > rexp(20)) > d3 <- rbind(d1, d2) > str(d1) # <- id is factor > str(d2) # <- id is factor > str(d3) # <- id is factor > d1[["id"]] <- as.character(d1[["id"]]) > d2[["id"]] <- as.character(d2[["id"]]) > d3 <- rbind(d1, d2) > str(d1) # <- id is character > str(d2) # <- id is character > str(d3) # <- id is factor-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595
Brian Ripley wrote:> d1 <- data.frame(id=I(letters[1:20]), x = runif(20)) > d2 <- data.frame(id=I(paste(letters[1:20],letters[1:20],sep="")), > x=rexp(20)) > d3 <- rbind(d1, d2) > > which of course works!Why ``of course''? It seems to me that there is no ``of course'' about it. It is completely counter-intuitive. What appears to be going on is that the I() operator NOT ONLY puts its argument into the data frame ``as is'', but it ALSO tacks a class ``AsIs'' onto that argument which prevents it from being mucked around with thereafter. This is a neat trick, but is fairly mysterious --- and could have intricate ramifications. How can one discern all the impacts of an object's having ``AsIs'' as a class? (It would appear that objects of any structure and class can ``inherit from'' AsIs.) It would be highly preferable not to have to use the I() operator, or the ``AsIs'' class at all. I.e. to have character vectors stay character vectors unless the user explicitly asks them to be converted to factors. However Splus introduced the contrary policy years ago, and R is stuck with it for compatibility reasons. cheers, Rolf Turner rolf at math.unb.ca