a296180@agate.fmr.com
2002-May-29 11:10 UTC
merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)
If the following two conditions are met: 1) all.x is TRUE 2) at least 1 row in y does not have a match in x then any character vectors in y will be coerced to be factors. Here is a simple example (previously provided on r-devel):> x <- data.frame(a = 1:4) > y <- data.frame(b = LETTERS[1:3]) > y$b <- as.character(y$b) > z <- merge(x, y, by = 0, all.x = TRUE) > zRow.names a b 1 1 1 A 2 2 2 B 3 3 3 C 4 4 4 <NA>> sapply(z, data.class)Row.names a b "factor" "numeric" "factor">This problem could be fixed by changing the line in merge.data.frame: for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx) to: for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NA To the extent that this is a feature rather than a bug (if so, I would like to know why), then I would suggest that the following sentence be added to the documentation for merge at the end of the section on all.x "Be aware that, if all.x equals `TRUE', character vectors in `y' will be converted to factors if any rows in y have no matching row in `x'." Thanks, Dave Kane -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian D Ripley
2002-May-29 11:34 UTC
(PR#1608) merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)
On Wed, 29 May 2002 a296180@agate.fmr.com wrote:> If the following two conditions are met: > > 1) all.x is TRUE > > 2) at least 1 row in y does not have a match in x > > then any character vectors in y will be coerced to be factors. Here is a simple > example (previously provided on r-devel): > > > x <- data.frame(a = 1:4) > > y <- data.frame(b = LETTERS[1:3]) > > y$b <- as.character(y$b) > > z <- merge(x, y, by = 0, all.x = TRUE) > > z > Row.names a b > 1 1 1 A > 2 2 2 B > 3 3 3 C > 4 4 4 <NA> > > sapply(z, data.class) > Row.names a b > "factor" "numeric" "factor" > > > > This problem could be fixed by changing the line in merge.data.frame: > > for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx) > > to: > > for (i in seq(along = y)) y[((lxy + 1):(lxy + nxx)), i] <- NABut other problems would be introduced, as the two operations are not equivalent (and the right one has been used).> To the extent that this is a feature rather than a bug (if so, I would like to > know why),I have already patiently explained it to you. It is a side issue of subscripting of data frames converting character columns to factor. I have also given you a workaround.> then I would suggest that the following sentence be added to the > documentation for merge at the end of the section on all.x > > "Be aware that, if all.x equals `TRUE', character vectors in `y' will be > converted to factors if any rows in y have no matching row in `x'."As I said before, this is a consequence of the general rules. Data frames are not designed to have character columns, and those who insist on using them must make themselves aware of the consequences. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Possibly Parallel Threads
- is.na() can coerce character vectors to be factors within a dataframe
- patching ?merge to allow the user to keep the order of one of the two data.frame objects merged
- Summary of Characters vectors, NA's and "" in merges
- calleridname.agi patch to only overwrite name if it is missing
- Realtime Pattern Matching