ggrothendieck@yifan.net
2002-Mar-30 19:53 UTC
[R] Inconsistency among mean, median, max, var
I found a strange inconsistency: If m is a matrix and d is a data frame then - mean(m), median(m), max(m) and max(d) all return a single value but - mean(d) returns the column means - median(d) fails - both var(m) and var(d) return the variance covariance matrix You pretty much have to experiment to figure this out since much of this behavior is not readily obvious from the help files. Even after you have figured it out, its pretty hard to remember. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
ripley@stats.ox.ac.uk
2002-Mar-30 20:25 UTC
[R] Inconsistency among mean, median, max, var
On Sat, 30 Mar 2002 ggrothendieck at yifan.net wrote:> I found a strange inconsistency:Well, these do work as documented, and I don't find it even ordinarily inconsistent.> If m is a matrix and d is a data frame then > > - mean(m), median(m), max(m) and max(d) all return a single value > > but > > - mean(d) returns the column means > - median(d) fails > - both var(m) and var(d) return the variance covariance matrix > > You pretty much have to experiment to figure this out since much of this > behavior is not readily obvious from the help files.I don't think that is even 1% fair: ?mean clearly says what it does for a data frame. ?median clearly says it only works for numeric vectors. ?var clearly says that it works for `a numeric vector, matrix or data frame' Whatever is the problem with that? -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
ggrothendieck@yifan.net
2002-Mar-30 23:14 UTC
[R] Inconsistency among mean, median, max, var
Don't get me wrong. I think the R package is great and, in fact, am personally investing time to learn it. I particularly like its object oriented nature, data frames (which nicely organize datasets) and the large and increasing set of packages and interfaces available for it. I only mention my problems with it in hope it will lead to better more consistent software. My comments are not a criticism. They are helpful (hopefully) feedback. Regarding specifically your query on what is wrong: its too complex and concepts are not orthogonal. Realistically its necessary to keep going back to the documentation or test it out to figure out what these functions do if you don't want to make a mistake. You need a decision matrix like this one just to figure out what you are going to get. ----- argument type ------ matrix dataframe sum single value single value max single value single value median single value fails mean single value columnwise sd columnwise columnwise var varcov mat varcov mat My best try at summarizing this is to split it into two sets of rows as shown above with the following description: - mean produces a single value on a matrix and acts columnwise on dataframes - sd works columwise - var produces a variance covariance matrix - others produce a single value except for median which fails on dataframes It might be an idea to try out more functions just to see how other functions fit in. I use another statistical package in which the 12 corresponding functions have a consistent result (work columnwise). On 30 Mar 2002 at 20:25, ripley at stats.ox.ac.uk wrote:> On Sat, 30 Mar 2002 ggrothendieck at yifan.net wrote: > > > I found a strange inconsistency: > > Well, these do work as documented, and I don't find it even ordinarily > inconsistent. > > > If m is a matrix and d is a data frame then > > > > - mean(m), median(m), max(m) and max(d) all return a single value > > > > but > > > > - mean(d) returns the column means > > - median(d) fails > > - both var(m) and var(d) return the variance covariance matrix > > > > You pretty much have to experiment to figure this out since much of this > > behavior is not readily obvious from the help files. > > I don't think that is even 1% fair: > > ?mean clearly says what it does for a data frame. > ?median clearly says it only works for numeric vectors. > ?var clearly says that it works for `a numeric vector, matrix or data > frame' > > Whatever is the problem with that? > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272860 (secr) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._