The help page for mean does not say what happens when one applies mean to a matrix. mean and sd work in an inconsistent way on a matrix so that should at least be documented. Also there should be a See Also to colMeans since that provides the missing column-wise analog to sd.
>>>>> "Gabor" == Gabor Grothendieck <ggrothendieck at gmail.com> >>>>> on Thu, 25 Jan 2007 09:53:49 -0500 writes:Gabor> The help page for mean does not say what happens when one Gabor> applies mean to a matrix. Gabor> mean and sd work in an inconsistent way on a matrix Gabor> so that should at least be documented. You are right (though I think this *was* documented at some point in time). As a matter of fact, I hate the the inconsistencies you've been mentioning, and I think is very wrong from an S-pedagogical point of view.... both that sd(mat) :<==> apply(mat, 2, sd) and mean(dfr) :<==> apply(dfr, 2, mean) and it leads just to wrong ``analogy conclusions'' by useRs. I'd vote for deprecating these ``builtin conveniences'' in order to gain consistency and clarity... Though I haven't checked how many CRAN + Bioconductor packages would break if we'd disactivate these two mis-features ... Martin Gabor> Also there should be a See Also to colMeans since that Gabor> provides the missing column-wise analog to sd.
G'day Gabor, On Thu, 25 Jan 2007 09:53:49 -0500 "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:> The help page for mean does not say what happens when one > applies mean to a matrix.Well, not directly. :-) But the help page of mean says that one of the arguments is: x: An R object. Currently there are methods for numeric data frames, numeric vectors and dates. A complex vector is allowed for 'trim = 0', only. And the `Value' section states: For a data frame, a named vector with the appropriate method being applied column by column. If 'trim' is zero (the default), the arithmetic mean of the values in 'x' is computed, as a numeric or complex vector of length one. If any argument is not logical (coerced to numeric), integer, numeric or complex, 'NA' is returned, with a warning. Since a matrix is a vector with a dimension attribute, and not a data frame, one can deduce that the second paragraph describes the return value for `mean(x)' when x is a matrix. As I always tell my students, reading R help pages is a bit of an art. :)> mean and sd work in an inconsistent way on a matrix so that should at > least be documented.Agreed. But it is documented in the help page of sd, which clearly states: [....] If 'x' is a matrix or a data frame, a vector of the standard deviation of the columns is returned. I guess you also want to have it documented in the mean help page? But then, should `var' also be mentioned in the mean help page? This command also work in an a different and inconsistent manner to mean on matrices. And, of course, there are other subtle inconsistencies in the language used in these help pages. Note that the mean help page talks about "numeric data frames" while the help pages of `var' and `se' talk about "data frames" only, though all components of the data frame have to be numeric, of course.> Also there should be a See Also to colMeans since that provides the > missing column-wise analog to sd.That's probably a good idea. What would you suggest should be mentioned to provide the column-wise analog of `var'? Cheers, Berwin
Good point. Perhaps what is needed is a Note clarifying all this in ?mean (unless the software itself is reworked as Martin has discussed). Regarding var(x), one could use sd(x)^2. On 1/25/07, Berwin A Turlach <statba at nus.edu.sg> wrote:> G'day Gabor, > > On Thu, 25 Jan 2007 09:53:49 -0500 > "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote: > > > The help page for mean does not say what happens when one > > applies mean to a matrix. > > Well, not directly. :-) > > But the help page of mean says that one of the arguments is: > > x: An R object. Currently there are methods for numeric data > frames, numeric vectors and dates. A complex vector is > allowed for 'trim = 0', only. > > And the `Value' section states: > > For a data frame, a named vector with the appropriate method being > applied column by column. > > If 'trim' is zero (the default), the arithmetic mean of the values > in 'x' is computed, as a numeric or complex vector of length one. > If any argument is not logical (coerced to numeric), integer, > numeric or complex, 'NA' is returned, with a warning. > > Since a matrix is a vector with a dimension attribute, and not a data > frame, one can deduce that the second paragraph describes the return > value for `mean(x)' when x is a matrix. > > As I always tell my students, reading R help pages is a bit of an > art. :) > > > mean and sd work in an inconsistent way on a matrix so that should at > > least be documented. > > Agreed. But it is documented in the help page of sd, which clearly > states: > > [....] If 'x' is a matrix or a data frame, a vector > of the standard deviation of the columns is returned. > > I guess you also want to have it documented in the mean help page? > > But then, should `var' also be mentioned in the mean help page? This > command also work in an a different and inconsistent manner to mean on > matrices. > > And, of course, there are other subtle inconsistencies in the language > used in these help pages. Note that the mean help page talks about > "numeric data frames" while the help pages of `var' and `se' talk about > "data frames" only, though all components of the data frame have to be > numeric, of course. > > > Also there should be a See Also to colMeans since that provides the > > missing column-wise analog to sd. > > That's probably a good idea. What would you suggest should be > mentioned to provide the column-wise analog of `var'? > > Cheers, > > Berwin >