Jan Theodore Galkowski
2009-Aug-04 05:22 UTC
[R] "na.strings" and the like; suspending interpretation of "NA"
Can someone point me to the proper place in the documentation or on the Wiki where I can learn how to get R to stop interpreting the string "NA" as something special? I have a table in a database which contains (among other things) country codes and continent codes. The standard set of two-letter codes includes "NA" to denote "North America". I learned of the "na.strings" parameter for RODBC's "sqlQuery", being able to shut down this interpretation when data is read in. However, in the program which uses this data, I (must) have some other instance where the "NA" gets spontaneously"interpreted as "not available", shows up in vectors and lists as "<NA>", and breaks function. I temporarily solved the problem by defining all instances of "NA" in the database as "NAC". It still would be good to know a generaly solution. I've seen something mentioned in conjunction with "options", but I'm not sure what that is about. Thanks much, - Jan, Akamai Technologies, Cambridge, MA
Daniel Nordlund
2009-Aug-04 06:45 UTC
[R] "na.strings" and the like; suspending interpretation of "NA"
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Jan > Theodore Galkowski > Sent: Monday, August 03, 2009 10:23 PM > To: R Project > Subject: [R] "na.strings" and the like; suspending > interpretation of "NA" > > Can someone point me to the proper place in the documentation > or on the > Wiki where I can learn how to get R to stop interpreting the > string "NA" > as something special? I have a table in a database which contains > (among other things) country codes and continent codes. The standard > set of two-letter codes includes "NA" to denote "North America". I > learned of the "na.strings" parameter for RODBC's "sqlQuery", > being able > to shut down this interpretation when data is read in. > > However, in the program which uses this data, I (must) have some other > instance where the "NA" gets spontaneously"interpreted as "not > available", shows up in vectors and lists as "<NA>", and breaks > function. I temporarily solved the problem by defining all > instances of > "NA" in the database as "NAC". It still would be good to know a > generaly solution. I've seen something mentioned in conjunction with > "options", but I'm not sure what that is about. > > Thanks much, > > - Jan, > Akamai Technologies, > Cambridge, MA >Jan, If you search the help for NA, i.e. ?NA You will see: Details The NA of character type is distinct from the string "NA". Programmers who need to specify an explicit string NA should use NA_character_ rather than "NA", or set elements to NA using is.na<-. So one can do the following> s <- 'NA' > s[1] "NA"> is.na(s)[1] FALSE> s2 <- LETTERS[1:6] > s2[6] <- NA > s2[1] "A" "B" "C" "D" "E" NA> is.na(s2)[1] FALSE FALSE FALSE FALSE FALSE TRUE>Notice that in string s, the characters (NA) are surrounded by quotes, and R returns false for is.na(). But for string s2, the missing value NA is not surrounded by quotes and is.na() returns TRUE for s2[6]. So R itself does not confuse "NA" with character type NA. You will need to give more detail about how you are using RODBC, how your original data are structured, and where in your program values are getting converted to NA, before anyone can give you much help. Dan Daniel Nordlund Bothell, WA USA
Peter Dalgaard
2009-Aug-04 07:05 UTC
[R] "na.strings" and the like; suspending interpretation of "NA"
Jan Theodore Galkowski wrote:> Can someone point me to the proper place in the documentation or on the > Wiki where I can learn how to get R to stop interpreting the string "NA" > as something special? I have a table in a database which contains > (among other things) country codes and continent codes. The standard > set of two-letter codes includes "NA" to denote "North America". I > learned of the "na.strings" parameter for RODBC's "sqlQuery", being able > to shut down this interpretation when data is read in. > > However, in the program which uses this data, I (must) have some other > instance where the "NA" gets spontaneously"interpreted as "not > available", shows up in vectors and lists as "<NA>", and breaks > function. I temporarily solved the problem by defining all instances of > "NA" in the database as "NAC". It still would be good to know a > generaly solution. I've seen something mentioned in conjunction with > "options", but I'm not sure what that is about.The general paradigm is that this shouldn't happen... Back in the old days, R had no such thing as character NA, and users had to sort out the North America, noradrenaline, Neil Armstrong, etc., issues for themselves. Nowadays we do have calculus that preserves "NA" as distinct from <NA>; so if one is converted to the other, it could signify a bug. It could also be due to particularly silly code on your behalf, but in either case we need to see the effect narrowed down to a reproducible stretch of code. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907