The summary function behaves inconsistently with data frame columns, e.g. summary(rock) #max of area 12212, correct summary(rock$area) #max of area 12210, incorrect max I know that summary(rock$area, digits=5) will correct the error (I DID read the manual). But my point is the inconsistency, because I get the correct answer without having to add the digits option in the first statement when referring to the full dataframe. This is one of the first functions that beginners use and if they have to RTM and tinker with options before they can get a consistent value for the max of an integer column, it is off-putting to say the least. At worst it confirms the skeptic's suspicion that open-source software is a bit flaky. Would it be out of line to report this to r-bugs -- at least to improve on the documentation? -jms r2.13.1 maclion [[alternative HTML version deleted]]
I have not read the manual, but I drew 10000 random normal vectors and 10000 random Poisson vectors of length 10000 and was unable to reproduce this behavior. Can you provide an example (self-contained code) that reproduces this problem? Thanks, Daniel Jeanne M. Spicer wrote:> > The summary function behaves inconsistently with data frame columns, e.g. > > summary(rock) #max of area 12212, correct > summary(rock$area) #max of area 12210, incorrect max > > I know that > summary(rock$area, digits=5) > will correct the error (I DID read the manual). But my point is the > inconsistency, because I get the correct answer without having to add the > digits option in the first statement when referring to the full dataframe. > This is one of the first functions that beginners use and if they have to > RTM and tinker with options before they can get a consistent value for the > max of an integer column, it is off-putting to say the least. At worst it > confirms the skeptic's suspicion that open-source software is a bit flaky. > Would it be out of line to report this to r-bugs -- at least to improve on > the documentation? > > -jms > r2.13.1 maclion > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- View this message in context: http://r.789695.n4.nabble.com/inconsistent-behavior-of-summary-function-tp3869906p3870106.html Sent from the R help mailing list archive at Nabble.com.
On 04/10/11 19:58, Daniel Malter wrote:> I have not read the manual, but I drew 10000 random normal vectors and 10000 > random Poisson vectors of length 10000 and was unable to reproduce this > behavior. Can you provide an example (self-contained code) that reproduces > this problem?The OP *did* provide a reproducible example. The "rock" data are a built-in data set. See ?rock. Also the OP is correct! cheers, Rolf Turner
You are right, but this is difficult or impossible to really solve. The problem is that summary() is an S3 generic(?UseMethod) -- so essentially it can mean anything and do anything depending on the structure to which it's applied. In your case, the structures were a data frame and a vector (that it was a column of the data frame is irrelevant) and, as you noted, different options were used for the two functions. But it could be -- and probably does get -- much worse than that. The ability to dispatch different methods from a single generic call based on the structure of the object to which a function is applied is generally viewed as a positive feature of OO languages (of which native R has some features). But nothing's perfect. -- Bert On Mon, Oct 3, 2011 at 8:12 PM, Jeanne M. Spicer <xn8spicer@gmail.com>wrote:> The summary function behaves inconsistently with data frame columns, e.g. > > summary(rock) #max of area 12212, correct > summary(rock$area) #max of area 12210, incorrect max > > I know that > summary(rock$area, digits=5) > will correct the error (I DID read the manual). But my point is the > inconsistency, because I get the correct answer without having to add the > digits option in the first statement when referring to the full dataframe. > This is one of the first functions that beginners use and if they have to > RTM and tinker with options before they can get a consistent value for the > max of an integer column, it is off-putting to say the least. At worst it > confirms the skeptic's suspicion that open-source software is a bit flaky. > Would it be out of line to report this to r-bugs -- at least to improve on > the documentation? > > -jms > r2.13.1 maclion > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics [[alternative HTML version deleted]]