On 18-Jul-10 05:47:03, Suresh Singh wrote:> I have a data file in which one of the columns is country code and NA
> is the
> code for Namibia.
> When I read the data file using read.csv, NA for Namibia is being
> treated as
> null or "NA"
>
> How can I prevent this from happening?
>
> I tried the following but it didn't work
> input <- read.csv("padded.csv",header = TRUE,as.is =
c("code2"))
>
> thanks,
> Suresh
I suppose this was bound to happen, and in my view it represent
a bit of a mess! With a test file temp.csv:
Code,Country
DE,Germany
IT,Italy
NA,Namibia
FR,France
X <- read.csv("temp.csv")
X
Code Country
# 1 DE Germany
# 2 IT Italy
# 3 <NA> Namibia
# 4 FR France
which(is.na(X))
# [1] 3
exactly as Suresh describes. It does not help to surround the NA
in temp.csv with quotes:
Code,Country
DE,Germany
IT,Italy
"NA",Namibia
FR,France
leads to exactly the same result. And I have tried every variation
I can think of of "as.is" and "colClasses", still with
exactly the
same result!
Conclusion: If an entry in a data file is intended to become the
character value "NA", there seems to be no way of reading it in
directly. This should not be so: it should be preventable!
As a cure, assuming that no other value in the Country Code is
actually missing (and so should be <NA>), then (with Suresh's
naming) I would suggest, subsequent to reading in the file,
something like the following. The complication is that the variable
code2 is now a factor, and you cannot simply assign a character
value "NA" to its <NA> value -- you will get an error message.
Hence:
ix <- which(is.na(input$code2))
Y <- as.character(input$code2)
Y[ix] <- "NA"
input$code2) <- factor(Y)
The corresponding code for my test example is:
ix <- which(is.na(X$Code))
Y <- as.character(X$Code)
Y[ix] <- "NA"
X$Code <- factor(Y)
X
# Code Country
# 1 DE Germany
# 2 IT Italy
# 3 NA Namibia
# 4 FR France
which(is.na(X))
# integer(0)
So that works.
There ought to be an option in read.csv() and friends which suppresses
the conversion of a string "NA" found in input into an <NA>
value.
Maybe there is -- but, if so, it is not visible in the documentation!
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 18-Jul-10 Time: 09:25:05
------------------------------ XFMail ------------------------------