NA's in a data frame are not handled properly, if the data frame was read in using read.table (but I'm not sure if that is the reason of the problems): (I'm using Debian Linux 1.3) If I read the file *************** 1 2 2 3 3 4 4 5 ? 6 6 7 7 8 8 9 *************** into R using read.table I run into troubles: R : Copyright 1997, Robert Gentleman and Ross Ihaka Version 0.49 Beta (April 23, 1997) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type "license()" for details. R> x<-read.table("xxx", na.strings=3D"?") R> x V1 V2 1 1 2 2 2 3 3 3 4 4 4 5 5 NA 6 6 6 7 7 7 8 8 8 9 R> complete.cases(x) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE R> x$V1 [1] 1 2 3 4 NA 6 7 8 R> x$V1[5] [1] NA R> is.na(x$V1[5]) [1] FALSE but if I create the same dataframe in R I get the expected results: R> z<-data.frame(V1=3Dc(1:4, NA, 6:8), y2=3D2:9) R> z V1 y2 [1,] 1 2 [2,] 2 3 [3,] 3 4 [4,] 4 5 [5,] NA 6 [6,] 6 7 [7,] 7 8 [8,] 8 9 R> complete.cases(z) [1] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE R> is.na(z$V1[5]) [1] TRUE Can anybody confirm this? --=20 ------------------------------------------------------------------- Friedrich Leisch =20 Institut f=FCr Statistik Tel: (+43 1) 58801 4541 Technische Universit=E4t Wien Fax: (+43 1) 504 14 98 Wiedner Hauptstra=DFe 8-10/1071 Friedrich.Leisch@ci.tuwien.ac.at A-1040 Wien, Austria http://www.ci.tuwien.ac.at/~leisch PGP public key http://www.ci.tuwien.ac.at/~leisch/pgp.key ------------------------------------------------------------------- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
On Fri, 27 Jun 1997, Friedrich Leisch wrote:> > NA's in a data frame are not handled properly, if the data frame was > read in using read.table (but I'm not sure if that is the reason of > the problems): >The problem is with the na.strings argument of read.table. If you use this argument anything with an NA ends up as a factor. In your example the first column is a factor (though you can't tell by looking at it)> is.factor(x[,1])[1] TRUE> levels(x[,1])[1] "1" "2" "3" "4" "6" "7" "8" "NA">The reason is that read.table tries to handle na.strings twice. The scan() function gets the na.strings argument and so translates the "?" into "NA". The type.convert() function then also gets the na.strings argument and thinks that NAs are indicated by "?" which is no longer true. It then decides that the data are not numeric and returns a factor. The solution seems to be to stop passing na.strings to type.convert(). We need to keep na.strings in scan() to allow for missing character data. A patch is at the end of this message. This may break something else, of course ;-). Thomas Lumley ------------------------------------------------------+------ Biostatistics : "Never attribute to malice what : Uni of Washington : can be adequately explained by : Box 357232 : incompetence" - Hanlon's Razor : Seattle WA 98195-7232 : : ------------------------------------------------------------ *** read.table.rnew Fri Jun 27 11:26:15 1997 --- read.table.orig Fri Jun 27 11:21:20 1997 *************** *** 56,66 **** if (length(as.is) != cols) stop("as.is has the wrong length") for (i in 1:cols) { ! if (!as.is[i]) { ! data[[i]]<-type.convert(data[[i]]) ! # data[[i]] <- type.convert(data[[i]], ! # na.strings = na.strings) ! } } # now we determine row names if (missing(row.names)) { --- 56,64 ---- if (length(as.is) != cols) stop("as.is has the wrong length") for (i in 1:cols) { ! if (!as.is[i]) ! data[[i]] <- type.convert(data[[i]], ! na.strings = na.strings) } # now we determine row names if (missing(row.names)) { =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-