Jim Robison-Cox
2005-Jun-13 17:05 UTC
[R] To many NA's from mean(..., na.rm=T) when a column is all NA's
Dear R-help folks, I am seeing unexpected behaviour from the function mean with option na.rm =TRUE (which is removing a whole column of a data frame or matrix. example: testcase <- data.frame( x = 1:3, y = rep(NA,3)) mean(testcase[,1], na.rm=TRUE) [1] 2 mean(testcase[,2], na.rm = TRUE) [1] NaN OK, so far that seems sensible. Now I'd like to compute both means at once: lapply(testcase, mean, na.rm=T) ## this works $x [1] 2 $y [1] NaN But I thought that this would also work: apply(testcase, 2, mean, na.rm=T) x y NA NA Warning messages: 1: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...) 2: argument is not numeric or logical: returning NA in: mean.default(newX[, i], ...) Summary: If I have a data frame or a matrix where one entire column is NA's, mean(x, na.rm=T) works on that column, returning NaN, but fails using apply, in that apply returns NA for ALL columns. lapply works fine on the data frame. If you wonder why I'm building data frames with columns that could be all missing -- they arise as output of a simulation. The fact that the entire column is missing is informative in itself. I do wonder if this is a bug. Thanks, Jim Jim Robison-Cox ____________ Department of Math Sciences | | phone: (406)994-5340 2-214 Wilson Hall \ BZN, MT | FAX: (406)994-1789 Montana State University | *_______| Bozeman, MT 59717-2400 \_| e-mail: jimrc at math.montana.edu
Sundar Dorai-Raj
2005-Jun-13 17:19 UTC
[R] To many NA's from mean(..., na.rm=T) when a column is all NA's
Jim Robison-Cox wrote:> Dear R-help folks, > > I am seeing unexpected behaviour from the function mean > with option na.rm =TRUE (which is removing a whole column of a data frame > or matrix. > > example: > > testcase <- data.frame( x = 1:3, y = rep(NA,3)) > > mean(testcase[,1], na.rm=TRUE) > [1] 2 > mean(testcase[,2], na.rm = TRUE) > [1] NaN > > OK, so far that seems sensible. Now I'd like to compute both means at > once: > > lapply(testcase, mean, na.rm=T) ## this works > $x > [1] 2 > > $y > [1] NaN > > But I thought that this would also work: > > apply(testcase, 2, mean, na.rm=T) > x y > NA NA > Warning messages: > 1: argument is not numeric or logical: returning NA in: > mean.default(newX[, i], ...) > 2: argument is not numeric or logical: returning NA in: > mean.default(newX[, i], ...) > > Summary: > If I have a data frame or a matrix where one entire column is NA's, > mean(x, na.rm=T) works on that column, returning NaN, but fails using > apply, in that apply returns NA for ALL columns. > lapply works fine on the data frame. >Did you try this with a "matrix" or just a data.frame?> If you wonder why I'm building data frames with columns that could be > all missing -- they arise as output of a simulation. The fact that the > entire column is missing is informative in itself. > > > I do wonder if this is a bug. >Your problem is not ?apply, but ?as.matrix, which apply calls. Hint: Try as.matrix(testdata) and see what it returns. If you need a matrix, why construct a data.frame? The following will give you what you want: x <- matrix(c(1:3, rep(NA, 3)), nc = 2) apply(x, 2, mean, na.rm = TRUE) or better yet, colMeans(x, na.rm = TRUE) Note, that colMeans may give NA instead of NaN for column 2. See ?colMeans for an explanation. HTH, --sundar
Peter Dalgaard
2005-Jun-13 17:19 UTC
[R] To many NA's from mean(..., na.rm=T) when a column is all NA's
Jim Robison-Cox <jimrc at math.montana.edu> writes:> Summary: > If I have a data frame or a matrix where one entire column is NA's, > mean(x, na.rm=T) works on that column, returning NaN, but fails using > apply, in that apply returns NA for ALL columns. > lapply works fine on the data frame. > > If you wonder why I'm building data frames with columns that could be > all missing -- they arise as output of a simulation. The fact that the > entire column is missing is informative in itself. > > > I do wonder if this is a bug.It isn't... Cutting a long story short:> testcase <- data.frame( x = 1:3, y = rep(NA,3)) > as.matrix(testcase)x y 1 "1" NA 2 "2" NA 3 "3" NA> testcase <- data.frame( x = 1:3, y = as.numeric(rep(NA,3))) > as.matrix(testcase)x y 1 1 NA 2 2 NA 3 3 NA> apply(testcase,2,mean,na.rm=T)x y 2 NaN -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Achim Zeileis
2005-Jun-13 17:24 UTC
[R] To many NA's from mean(..., na.rm=T) when a column is all NA's
On Mon, 13 Jun 2005 11:05:46 -0600 (MDT) Jim Robison-Cox wrote:> Dear R-help folks, > > I am seeing unexpected behaviour from the function mean > with option na.rm =TRUE (which is removing a whole column of a data > frame or matrix. > > example: > > testcase <- data.frame( x = 1:3, y = rep(NA,3))In addition to what Sundar already wrote: In the code above x is numeric and y logical, hence as.matrix() will not do what you want (create a "character" matrix). Probably it is more appropriate to do testcase <- data.frame( x = 1:3, y = as.numeric(rep(NA,3))) hth, Z> mean(testcase[,1], na.rm=TRUE) > [1] 2 > mean(testcase[,2], na.rm = TRUE) > [1] NaN > > OK, so far that seems sensible. Now I'd like to compute both means > at > once: > > lapply(testcase, mean, na.rm=T) ## this works > $x > [1] 2 > > $y > [1] NaN > > But I thought that this would also work: > > apply(testcase, 2, mean, na.rm=T) > x y > NA NA > Warning messages: > 1: argument is not numeric or logical: returning NA in: > mean.default(newX[, i], ...) > 2: argument is not numeric or logical: returning NA in: > mean.default(newX[, i], ...) > > Summary: > If I have a data frame or a matrix where one entire column is NA's, > mean(x, na.rm=T) works on that column, returning NaN, but fails using > apply, in that apply returns NA for ALL columns. > lapply works fine on the data frame. > > If you wonder why I'm building data frames with columns that could > be > all missing -- they arise as output of a simulation. The fact that > the entire column is missing is informative in itself. > > > I do wonder if this is a bug. > > Thanks, > Jim > > Jim Robison-Cox ____________ > Department of Math Sciences | | phone: (406)994-5340 > 2-214 Wilson Hall \ BZN, MT | FAX: (406)994-1789 > Montana State University | *_______| > Bozeman, MT 59717-2400 \_| e-mail: > jimrc at math.montana.edu > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >