Hi all, I ran into a problem in some of my code that could be traced back to 'mean' sometimes returning NA and sometimes NaN, depending on the value of na.rm:> mean(c())[1] NA> mean(c(NA),na.rm=T)[1] NaN However, I don't understand the reasoning behind this and would appreciate and explanation. I understand that the mean of an empty vector is not definied, but I don't understand why it matters whether the vector was empty from the beginning or only after removing the NAs. Pascal Niklaus
Pascal A. Niklaus wrote:> Hi all, > > I ran into a problem in some of my code that could be traced back to 'mean' > sometimes returning NA and sometimes NaN, depending on the value of na.rm: > >> mean(c()) > [1] NA > >> mean(c(NA),na.rm=T) > [1] NaN > > However, I don't understand the reasoning behind this and would appreciate and > explanation. > > I understand that the mean of an empty vector is not definied,Not so, it is well-defined as 0/0 = NaN.> but I don't > understand why it matters whether the vector was empty from the beginningYou didn't try that case: mean(numeric(0)) is also NaN. The issue is that > typeof(c()) [1] "NULL" is not numeric (not evan a vector), and so mean() of it is undefined. > or only after removing the NAs. Speculation (and wrong).> Pascal Niklaus > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Pascal A. Niklaus wrote:> Hi all, > > I ran into a problem in some of my code that could be traced back to 'mean' > sometimes returning NA and sometimes NaN, depending on the value of na.rm: > > >> mean(c()) >> > [1] NA > > >> mean(c(NA),na.rm=T) >> > [1] NaN > > However, I don't understand the reasoning behind this and would appreciate and > explanation. >note the types: typeof(c()) typeof(c(NA)) typeof(c(NA)[-na.omit(c(NA))]) now, mean(NULL) mean(logical(0)) mean(c()) # NA, because you take the mean of a vector of non-{numeric,logical} type (see the warning message) mean(c(NA), na.rm=TRUE) # NaN, because you take the mean of a zero-length logical vector mean(c(NA), na.rm=FALSE) # NA, because you take the mean of a logical vector containing an NA you can argue that ?mean underspecifies this (it doesn't say anything about the value for a zero-length logical, numeric, or complex vector, though you can guess it will be the value of 0/0). vQ