David Kane <David Kane
2002-May-16 21:49 UTC
[R] is.na() can coerce character vectors to be factors within a dataframe
Thanks to Brian Ripley for suggesting, to my previous post about a problem with merge, that I trace through merge.data.frame. I did so with my test case and all seemed to be well until I got to: if (all.x) for (i in seq(along = y)) is.na(y[[i]]) <- (lxy + 1):(lxy + nxx) I believe that this code sets observations in y (which has been expanded to be the same size as x) that should be NA (because they were not in y) to NA. I think that this works fine, except for variables in y that are character. In that case, it converts them to factor. Consider a simple example:> test <- data.frame(var = LETTERS[1:3]) > test$var <- as.character(test$var) > testvar 1 A 2 B 3 C> is.na(test[[1]]) <- 2 > testvar 1 A 2 <NA> 3 C> is.factor(test$var)[1] TRUE Note that no problems arise if var is factor or numeric. This does not seem to be a problem with character vectors.> z <- LETTERS[1:3] > z[1] "A" "B" "C"> is.na(z) <- 2 > z[1] "A" NA "C"> is.factor(z)[1] FALSE The truly strange thing (at least for me) is that, as far as R is concerned, test[[1]] and z are identical.> z <- LETTERS[1:3] > test <- data.frame(var = LETTERS[1:3]) > test$var <- as.character(test$var) > identical(z, test[[1]])[1] TRUE So, why is.na() would coerce to factor for one but not for the other is a bit of a mystery to me. Presumably it's a two stage process whereby first the NA is inserted in the character vector var but then, when var is placed back into test, some sort of forced conversion takes place. But I really shouldn't speculate (further!) about something I know so little about. Question: 1) Is the conversion of character vectors to factor vectors within dataframes a feature or a bug of is.na()? Or am I misunderstanding the whole situation? Thanks, David Kane -- David Kane Geode Capital Management 617-563-0122 david.d.kane at fmr.com -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Maybe Matching Threads
- merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)
- (PR#1577) is.na<- coerces character vectors to be factors
- is.na<- coerces character vectors to be factors within dataframes (PR#1577)
- patching ?merge to allow the user to keep the order of one of the two data.frame objects merged
- FW: Summary of Suggestions for poor man's parallel processing