Gavin Simpson
2014-Aug-21 18:32 UTC
[Rd] Inconsistent handling of data frames in min(), max(), and mean()
This inconsistency recently came to my attention:> df <- data.frame(A = 1:10, B = rnorm(10)) > min(df)[1] -1.768958> max(df)[1] 10> mean(df)[1] NA Warning message: In mean.default(df) : argument is not numeric or logical: returning NA I recall the times where `mean(df)` would give `colMeans(df)` and this behaviour was deemed inconsistent. It seems though that the change has removed one inconsistency and replaced it with another. Am I missing good reasons why there couldn't be a `mean.data.frame()` method which worked like `max()` etc when given a data frame? Namely that they return the required statistic *only* when presented with a data frame of all numeric variables? E.g.> df <- data.frame(A = 1:10, B = rnorm(10), C = factor(rep(c("A","B"), each= 5)))> max(df)Error in FUN(X[[1L]], ...) : only defined on a data frame with all numeric variables I would expect `mean(df)` to fail with the same error as for `max(df)` with the new example `df` but for would return the same as `mean(as.matrix(df))` or `mean(colMeans(df))` if given an entirely numeric data frame:> mean(colMeans(df[, 1:2]))[1] 2.78366> mean(as.matrix(df[, 1:2]))[1] 2.78366> mean(df[,1:2])[1] 2.78366 I just can't see the sense in having `mean` work the way it does now? Thanks, Gavin -- Gavin Simpson, PhD [[alternative HTML version deleted]]
Martin Maechler
2014-Aug-22 08:23 UTC
[Rd] Inconsistent handling of data frames in min(), max(), and mean()
>>>>> Gavin Simpson <ucfagls at gmail.com> >>>>> on Thu, 21 Aug 2014 12:32:31 -0600 writes:> This inconsistency recently came to my attention: >> df <- data.frame(A = 1:10, B = rnorm(10)) >> min(df) > [1] -1.768958 >> max(df) > [1] 10 >> mean(df) > [1] NA Warning message: In mean.default(df) : argument is > not numeric or logical: returning NA I would tend to agree (:-) that mean() should rather give an error here (and read on). > I recall the times where `mean(df)` would give > `colMeans(df)` and this behaviour was deemed > inconsistent. > It seems though that the change has removed one > inconsistency and replaced it with another. The whole idea of removing the mean method for data frames was that there are many more summary functions, e.g. median, and it seems wrong to write a data frame method for each of them; then why for *some* of them. So we *did* keep the Summary.data.frame group method, and that's why min(), max(), sum(),.. work {though sum() will be slightly slower than colSums()}. When teaching R, the audience should learn to use apply() or similar functions, e.g. from the hadleyverse, because that is the general approach of dealing with matrix-like objects that is indeed how I think users should start thinking of data frames. > Am I missing good reasons why there couldn't be a > `mean.data.frame()` method which worked like `max()` etc > when given a data frame? yes, see above. [ There's no consistent end after that: Why is median() different, why would sd(), var(), ... not work ?] > Namely that they return the > required statistic *only* when presented with a data frame > of all numeric variables? E.g. >> df <- data.frame(A = 1:10, B = rnorm(10), C >> factor(rep(c("A","B"), each > = 5))) >> max(df) > Error in FUN(X[[1L]], ...) : only defined on a data frame > with all numeric variables > I would expect `mean(df)` to fail with the same error as > for `max(df)` with the new example `df` but for would > return the same as `mean(as.matrix(df))` or > `mean(colMeans(df))` if given an entirely numeric data > frame: >> mean(colMeans(df[, 1:2])) > [1] 2.78366 >> mean(as.matrix(df[, 1:2])) > [1] 2.78366 >> mean(df[,1:2]) > [1] 2.78366 > I just can't see the sense in having `mean` work the way > it does now? I agree. It would be better to give an error. E.g., mean.default could start with if(is.object(x)) stop("there is no mean() method for ", class(x)[1], " objects") > Thanks, > Gavin > -- > Gavin Simpson, PhD > [[alternative HTML version deleted]] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ( hmmm... and that on R-devel ... )