David Kane <David Kane
2001-Sep-28 16:04 UTC
[R] Summary of Characters vectors, NA's and "" in merges
Thanks to Brian Ripley, Gregory Warnes, and Dennis Murphy for considering my problem about "NA" in character strings. The nub of the issue seems to be that you can not have a string with "NA" in it in a character vector in R without it being intrepreted as meaning NA (i.e., not available). The only work-arounds involve renames of various sorts. Perhaps this is more appropriate for r-devel, but I was wondering what the future holds for character vectors in R, i.e., will this always be a problem? Although I am not smart enough to understand the Green Book, there is a discussion following page 200 that *seems* to suggest that the usage of a string class may make it easier to deal with this issue. Is there anything coming down the pike on this point? Greg suggested: Perhaps the simplest thing would be to change occurences of "NA" (meaning Nabisco) to something similar like "NA." before placing the variable in a dataframe....> a <- data.frame(x = 1:4) > y <- c("NA","a","b") > y[y=="NA"] <- "NA." > b <- data.frame(x = 1:3, y = y) > merge(a, b, all.x = TRUE)x y 1 1 NA. 2 2 a 3 3 b 4 4 NA This isn't very clean, but its simple... Dennis suggested: This might be a little naive, but...since R *is* case sensitive, would "Na" for Nabisco be a workable substitute?> str <- c("A","B","NA","Na","NA") > which(is.na(str))[1] 3 5 Thanks again to all, Dave -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian Ripley
2001-Sep-28 16:51 UTC
[R] Summary of Characters vectors, NA's and "" in merges
On Fri, 28 Sep 2001, David Kane <David Kane wrote:> Thanks to Brian Ripley, Gregory Warnes, and Dennis Murphy for considering my > problem about "NA" in character strings. The nub of the issue seems to be that > you can not have a string with "NA" in it in a character vector in R without it > being intrepreted as meaning NA (i.e., not available). The only work-arounds > involve renames of various sorts. > > Perhaps this is more appropriate for r-devel, but I was wondering what the > future holds for character vectors in R, i.e., will this always be a > problem? Although I am not smart enough to understand the Green Book, there is > a discussion following page 200 that *seems* to suggest that the usage of a > string class may make it easier to deal with this issue. > > Is there anything coming down the pike on this point?Well, we can't change character vectors without invalidating the integrity of lots of saved objects. One could use another class, but then you would need functions to handle that class. In the case in point that won't help much as merge.data.frame does an as.character when doing the matching, and a few other things (see below). The class string exists in S-PLUS 6 but is almost unused. You can do> foo <- as(c("NA", "OK"), "string") > foo[1] "NA" "OK"> is.na(foo)[1] F F> is.na(foo[2]) <- T > foo[1] "NA" <NA>> is.na(foo)[1] F T # but be careful:> foo[2] <- NA > foo[1] "NA" "NA" Note that you can do this with factors, and I tested it previously on your example. Start with x <- structure(c(1, 2, NA), levels = c("NA", "OK"), class="factor")> x[1] NA OK NA Levels: NA OK Here the first is "NA" and the third really is missing. So in your original example> a <- data.frame(x = 1:4) > b <- data.frame(x = 1:3, y = factor(c("NA", "a", "b"), exclude="")) > m <- merge(a, b, all.x = TRUE) > mx y 1 1 NA 2 2 a 3 3 b 4 4 NA you have lost the distinction (look and see) because of y[(lxy + 1):(lxy + nxx), ] <- NA and that suggests that [<-.factor is not quite right. That shows the subtleties involved: it does not work in S with string classes either. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reasonably Related Threads
- Characters vectors, NA's and "" in merges
- Characters subsetted with NA (was: Several R vs S-Plus issues)
- merge.data.frame can coerce character vectors to factor in some circumstances (PR#1608)
- Date Not Staying in Date Format
- Mail filters in incoming message