Gavin Simpson
2011-Apr-05 11:33 UTC
[Rd] Inconsistency between rowMeans documentation and reality?
Dear List, I'm not even sure this is an issue or not, but ?rowMeans has: Value: A numeric or complex array of suitable size, or a vector if the result is one-dimensional. The ?dimnames? (or ?names? for a vector result) are taken from the original array. If there are no values in a range to be summed over (after removing missing values with ?na.rm = TRUE?), that component of the output is set to ?0? (?*Sums?) or ?NA? (?*Means?), consistent with ?sum? and ?mean?. However the output of mean() and rowMeans() is not exactly the same when all supplied values are missing.> mean(NA, na.rm = TRUE)[1] NaN> mean(rep(NA, 5), na.rm = TRUE)[1] NaN> rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE)[1] NA So in one sense, the outputs are not consistent:> is.nan(mean(rep(NA, 5), na.rm = TRUE))[1] TRUE> is.nan(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE))[1] FALSE but in another they are:> is.na(mean(rep(NA, 5), na.rm = TRUE))[1] TRUE> is.na(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE))[1] TRUE I'm not familiar enough with the details to know if this even matters, but wonder if something in the documentation needs a change or tweak to clarify what is returned. As I say, in one sense the outputs are not consistent.> sessionInfo()R version 2.13.0 beta (2011-04-04 r55298) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 [7] LC_PAPER=en_GB.utf8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods [7] base loaded via a namespace (and not attached): [1] tools_2.13.0 Thanks, Gavin -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Prof Brian Ripley
2011-Apr-11 14:57 UTC
[Rd] Inconsistency between rowMeans documentation and reality?
I suspect you omitted some of the help page: As they are written for speed, they blur over some of the subtleties of ?NaN? and ?NA?. So, given that (and that real NA is a specific NaN) I think it is perfectly reasonable to claim they are consistent with mean. On Tue, 5 Apr 2011, Gavin Simpson wrote:> Dear List, > > I'm not even sure this is an issue or not, but ?rowMeans has: > > Value: > > A numeric or complex array of suitable size, or a vector if the > result is one-dimensional. The ?dimnames? (or ?names? for a > vector result) are taken from the original array. > > If there are no values in a range to be summed over (after > removing missing values with ?na.rm = TRUE?), that component of > the output is set to ?0? (?*Sums?) or ?NA? (?*Means?), consistent > with ?sum? and ?mean?. > > However the output of mean() and rowMeans() is not exactly the same when > all supplied values are missing. > >> mean(NA, na.rm = TRUE) > [1] NaN >> mean(rep(NA, 5), na.rm = TRUE) > [1] NaN >> rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE) > [1] NA > > So in one sense, the outputs are not consistent: > >> is.nan(mean(rep(NA, 5), na.rm = TRUE)) > [1] TRUE >> is.nan(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE)) > [1] FALSE > > but in another they are: > >> is.na(mean(rep(NA, 5), na.rm = TRUE)) > [1] TRUE >> is.na(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE)) > [1] TRUE > > I'm not familiar enough with the details to know if this even matters, > but wonder if something in the documentation needs a change or tweak to > clarify what is returned. As I say, in one sense the outputs are not > consistent. > >> sessionInfo() > R version 2.13.0 beta (2011-04-04 r55298) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > [7] base > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > Thanks, > > Gavin > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595