gregory_r_warnes@groton.pfizer.com
2002-Oct-09 16:54 UTC
problems with missing values created by conversion using as.matri (PR#2130)
> version_ platform sparc-sun-solaris2.8 arch sparc os solaris2.8 system sparc, solaris2.8 status major 1 minor 6.0 year 2002 month 10 day 01 language R ------------------------------ Create a very simple data frame containing an factor and a character vector each containing a missing value: > x <- data.frame( a=c("",NA), b=c(1,NA) ) Conversion to a matrix treats the two missing values differently: > as.matrix(x) a b 1 "" " 1" 2 NA "NA" The missing value in the factor variable has been correctly converted to a missing value, while the missing value in the numeric vector has been incorrectly converted to a string "NA", which is not recognized as a missing value: > is.na(as.matrix(x)) a b 1 FALSE FALSE 2 TRUE FALSE This turned up because I was using lapply to check for rows containing only blank or missing values: > all.blank <- function(x) all( is.na(x) | (x <= " ") ) > blanks <- apply(x, 1, all.blank) > blanks 1 2 FALSE FALSE This should have yielded > blanks 1 2 FALSE TRUE BTW direct conversion using as.character doesn't show any problems when applied to the individual columns: > as.character(x$a) [1] "" NA > as.character(x$b) [1] "1" NA I think the problem is that as.matrix.data.frame is using format() to convert things to characters, which is resulting in a "NA" string and not a missing value. Why isn't it using as.character() for this? For completeness here's the patch to make this change, but I have not explored what other side effects this might have. *** R-1.6.0/src/library/base/R/dataframe.R Thu Aug 29 03:41:42 2002 --- R-1.6.0-GRW//src/library/base/R/dataframe.R Wed Oct 9 12:29:11 2002 *************** *** 931,937 **** if (is.character(X[[j]])) next xj <- X[[j]] ! X[[j]] <- if(length(levels(xj))) as.vector(xj) else format(xj) } } X <- unlist(X, recursive = FALSE, use.names = FALSE) --- 931,937 ---- if (is.character(X[[j]])) next xj <- X[[j]] ! X[[j]] <- if(length(levels(xj))) as.vector(xj) else as.character(xj) } } X <- unlist(X, recursive = FALSE, use.names = FALSE) -Greg LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley
2002-Oct-09 21:21 UTC
problems with missing values created by conversion using as.matri (PR#2130)
On Wed, 9 Oct 2002 gregory_r_warnes@groton.pfizer.com wrote:> ------------------------------ > > Create a very simple data frame containing an factor and a character vector > each containing a missing value: > > > x <- data.frame( a=c("",NA), b=c(1,NA) ) > > Conversion to a matrix treats the two missing values differently: >Yes, as.matrix.data.frame() uses format() to make character strings, and it needs to check for NA as well -thomas -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reasonably Related Threads
- as.matrix.data.frame() in R 1.9.0 converts to character when it should (?) convert to numeric
- FW: layout and piechart diameter problem (PR#1300)
- Problem handling NA indexes for character matrixes (PR#1447)
- Re: gregmisc version 0.7.3 now available
- cbind/rbind fail on matrixes containing lists (PR#6702)