Gavin Simpson
2011-Apr-05 11:33 UTC
[Rd] Inconsistency between rowMeans documentation and reality?
Dear List,
I'm not even sure this is an issue or not, but ?rowMeans has:
Value:
A numeric or complex array of suitable size, or a vector if the
result is one-dimensional. The ?dimnames? (or ?names? for a
vector result) are taken from the original array.
If there are no values in a range to be summed over (after
removing missing values with ?na.rm = TRUE?), that component of
the output is set to ?0? (?*Sums?) or ?NA? (?*Means?), consistent
with ?sum? and ?mean?.
However the output of mean() and rowMeans() is not exactly the same when
all supplied values are missing.
> mean(NA, na.rm = TRUE)
[1] NaN> mean(rep(NA, 5), na.rm = TRUE)
[1] NaN> rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE)
[1] NA
So in one sense, the outputs are not consistent:
> is.nan(mean(rep(NA, 5), na.rm = TRUE))
[1] TRUE> is.nan(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE))
[1] FALSE
but in another they are:
> is.na(mean(rep(NA, 5), na.rm = TRUE))
[1] TRUE> is.na(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE))
[1] TRUE
I'm not familiar enough with the details to know if this even matters,
but wonder if something in the documentation needs a change or tweak to
clarify what is returned. As I say, in one sense the outputs are not
consistent.
> sessionInfo()
R version 2.13.0 beta (2011-04-04 r55298)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8
[7] LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_2.13.0
Thanks,
Gavin
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson [t] +44 (0)20 7679 0522
ECRC, UCL Geography, [f] +44 (0)20 7679 0565
Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Prof Brian Ripley
2011-Apr-11 14:57 UTC
[Rd] Inconsistency between rowMeans documentation and reality?
I suspect you omitted some of the help page: As they are written for speed, they blur over some of the subtleties of ?NaN? and ?NA?. So, given that (and that real NA is a specific NaN) I think it is perfectly reasonable to claim they are consistent with mean. On Tue, 5 Apr 2011, Gavin Simpson wrote:> Dear List, > > I'm not even sure this is an issue or not, but ?rowMeans has: > > Value: > > A numeric or complex array of suitable size, or a vector if the > result is one-dimensional. The ?dimnames? (or ?names? for a > vector result) are taken from the original array. > > If there are no values in a range to be summed over (after > removing missing values with ?na.rm = TRUE?), that component of > the output is set to ?0? (?*Sums?) or ?NA? (?*Means?), consistent > with ?sum? and ?mean?. > > However the output of mean() and rowMeans() is not exactly the same when > all supplied values are missing. > >> mean(NA, na.rm = TRUE) > [1] NaN >> mean(rep(NA, 5), na.rm = TRUE) > [1] NaN >> rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE) > [1] NA > > So in one sense, the outputs are not consistent: > >> is.nan(mean(rep(NA, 5), na.rm = TRUE)) > [1] TRUE >> is.nan(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE)) > [1] FALSE > > but in another they are: > >> is.na(mean(rep(NA, 5), na.rm = TRUE)) > [1] TRUE >> is.na(rowMeans(matrix(rep(NA, 5), ncol = 5), na.rm = TRUE)) > [1] TRUE > > I'm not familiar enough with the details to know if this even matters, > but wonder if something in the documentation needs a change or tweak to > clarify what is returned. As I say, in one sense the outputs are not > consistent. > >> sessionInfo() > R version 2.13.0 beta (2011-04-04 r55298) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods > [7] base > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > Thanks, > > Gavin > -- > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > Dr. Gavin Simpson [t] +44 (0)20 7679 0522 > ECRC, UCL Geography, [f] +44 (0)20 7679 0565 > Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk > Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ > UK. WC1E 6BT. [w] http://www.freshwaters.org.uk > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595