David Kane <David Kane
2001-Sep-26 12:10 UTC
[R] Characters vectors, NA's and "" in merges
I often use merge with dataframes that contain character vectors which have elements that are sometimes "NA" (meaning the string NA, not the same thing, obviously, as NA in a numeric or factor vector). For example, the stock ticker for Nabisco was "NA". Unfortunately (for me), it seems like merge insists on inserting "NA" for missing values. My question: Is there some way around this? Here is a simple example:> version_ platform sparc-sun-solaris2.6 arch sparc os solaris2.6 system sparc, solaris2.6 status major 1 minor 3.0 year 2001 month 06 day 22 language R> a <- data.frame(x = 1:4) > b <- data.frame(x = 1:3, y = c("NA", "a", "b")) > merge(a, b, all.x = TRUE)x y 1 1 NA 2 2 a 3 3 b 4 4 NA Rows 1:3 are what I expect them to be. Row 4 is "wrong" in the sense that dataframe b did not contain a row for x = 4. Of course, there is a sense that *any* value, including "", that is placed in row 4 is potentially misleading. Perhaps I am misunderstanding the meaning of "NA" in a character vector (i.e., I am not allowed to have "real" values that are that string). If there were some way (an "nomatch" argument?) that the user could specify what missing values are used for character strings, then I would be fine. Again, I suspect that my real problem is not understanding how to specify "NA" -- meaning Nabisco's ticker symbol -- in a character vector. Any suggestions would be much appreciated. Dave Kane -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Wed, 26 Sep 2001, David Kane <David Kane wrote:> I often use merge with dataframes that contain character vectors which have > elements that are sometimes "NA" (meaning the string NA, not the same thing, > obviously, as NA in a numeric or factor vector). For example, the stock ticker > for Nabisco was "NA". Unfortunately (for me), it seems like merge insists on > inserting "NA" for missing values. My question: Is there some way around this?> Here is a simple example: > > > version > _ > platform sparc-sun-solaris2.6 > arch sparc > os solaris2.6 > system sparc, solaris2.6 > status > major 1 > minor 3.0 > year 2001 > month 06 > day 22 > language R > > > a <- data.frame(x = 1:4) > > b <- data.frame(x = 1:3, y = c("NA", "a", "b"))Take a look. b$y is a factor with levels "a" and "b", and a missing first value.> > merge(a, b, all.x = TRUE) > x y > 1 1 NA > 2 2 a > 3 3 b > 4 4 NA > > Rows 1:3 are what I expect them to be. Row 4 is "wrong" in the sense that > dataframe b did not contain a row for x = 4. Of course, there is a sense that > *any* value, including "", that is placed in row 4 is potentially > misleading. Perhaps I am misunderstanding the meaning of "NA" in a character > vector (i.e., I am not allowed to have "real" values that are that string).That is the correct answer. Because you asked for all.x=TRUE, you got a missing value there in row 4 col 2.> If there were some way (an "nomatch" argument?) that the user could specify > what missing values are used for character strings, then I would be > fine. Again, I suspect that my real problem is not understanding how to specify > "NA" -- meaning Nabisco's ticker symbol -- in a character vector.You cannot avoid it being taken as the missing value, AFAIK. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._