Martin Maechler
2016-Aug-24 09:36 UTC
[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors
>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Tue, 23 Aug 2016 14:33:58 +0200 writes:>>>>> Dirk Eddelbuettel <edd at debian.org> >>>>> on Fri, 19 Aug 2016 11:40:05 -0500 writes:>> It is the old story of defined behaviour and expected outcomes. Hard to >> change now. > yes... not impossible though... see below >> So I would suggest you do something like this in your ~/.Rprofile: R> smry <- function(...) summary(..., digits=6) R> smry(155555L) >> Min. 1st Qu. Median Mean 3rd Qu. Max. >> 155555 155555 155555 155555 155555 155555 R> >> Maybe call it Summary() instead. > yes, do use a different name. There other such functions, 'summarize()'. > Simone wrote >> I had raised the matter ten years ago, and I was told that the topic was >> already very^3 old >> >> https://stat.ethz.ch/pipermail/r-devel/2006-September/042684.html >> >> there is some discussion on its origin and also a declaration of intents to >> change the default behaviour, which, unfortunately, remained a declaration. >> I agree that R could do better here, let's hope in less than ten years >> though. ;-) > and the 2006 thread he mentions is basically a similar question > and a reply by me that I agreed to some extent that a change was > desirable ... originally we had adhered to the S "standard" > which became the S+ one and at that time I did still have access > to a running instance of S-PLUS 6.2 where I had seen that > Insightful (the company selling curating and selling S-PLUS) > also had decided to change the ~15 year old S "standard"... and > indeed I was implicitly *asking* for proposals of such a change, > but I think I never saw a (careful) proposal. > In the spirit of probably 99% of other "base R" code, a change > should really *not* round __at all__ in the summary() methods, > but *only* in the print() methods of such summary() results. > OTOH, for back compatibility, if a user does use summary(.., digits=.) > explicitly, these digits should be 'obeyed' of course. > I think summary(<1-variable>) could easily, and relatively "back-compatibly" > be changed in the above vain. > One "real problem" is the wrong decision (also from S and S-PLUS > times IIRC) to return a "character" matrix for > summary(<data.frame>, ..) > or summary(<matrix>, ..) > (For a data frame, I think it should return a list() of > single-variable summary()es, or then a numeric matrix .. in > both cases have a good print() method) > because when you return a character matrix, all the numbers are > already rounded, ... and if we follow the above approach they > would have to be rounded further... ``the horror'' > I wonder how much code out there is relying on the internal > structure of summary(<data.frame>).. because that is the one > part I'd definitely want to change, too. [Talking to myself .. ;-)] Yes, but that's the tough part to change. This thread's topic is really only about changing summary.default(), and I have started testing such a change now, and that does seem very sensible: - No rounding in summary.default(), but - (almost) back-compatible rounding in its print() method. My current plan is to commit this to R-devel in a day or so, unless unforeseen issues emerge. Martin
John Mount
2016-Aug-24 14:25 UTC
[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors
> On Aug 24, 2016, at 2:36 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote: > >>>>>> > > [Talking to myself .. ;-)] > Yes, but that's the tough part to change. > > This thread's topic is really only about changing summary.default(), > and I have started testing such a change now, and that does seem > very sensible: > > - No rounding in summary.default(), but > - (almost) back-compatible rounding in its print() method. > > My current plan is to commit this to R-devel in a day or so, > unless unforeseen issues emerge. > > Martin >That is potentially a very good outcome. Thank you so much for producing and testing a patch. --------------- John Mount http://www.win-vector.com/ <http://www.win-vector.com/> Our book: Practical Data Science with R http://www.manning.com/zumel/ <http://www.manning.com/zumel/> [[alternative HTML version deleted]]
Martin Maechler
2016-Aug-25 20:11 UTC
[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors
>>>>> John Mount <jmount at win-vector.com> >>>>> on Wed, 24 Aug 2016 07:25:50 -0700 writes:>> On Aug 24, 2016, at 2:36 AM, Martin Maechler >> <maechler at stat.math.ethz.ch> wrote: >> >>>>>>> >> >> [Talking to myself .. ;-)] Yes, but that's the tough part >> to change. >> >> This thread's topic is really only about changing >> summary.default(), and I have started testing such a >> change now, and that does seem very sensible: >> >> - No rounding in summary.default(), but - (almost) >> back-compatible rounding in its print() method. >> >> My current plan is to commit this to R-devel in a day or >> so, unless unforeseen issues emerge. >> >> Martin >> > That is potentially a very good outcome. Thank you so > much for producing and testing a patch. I have now committed such a change to R-devel: ------------------------------------------------------------------------ r71150 | maechler | 2016-08-25 21:57:19 +0200 (Thu, 25 Aug 2016) | 1 line Changed paths: M /trunk/doc/NEWS.Rd M /trunk/src/library/base/R/summary.R M /trunk/src/library/base/man/summary.Rd M /trunk/src/library/stats/R/ecdf.R M /trunk/tests/Examples/stats-Ex.Rout.save M /trunk/tests/reg-tests-2.Rout.save summary.default() no longer rounds by default; just *prints* rounded ------------------------------------------------------------------------ I do expect quite a few packages giving slightly changed output, typically uniformly not-worse one, but just "typically". Note that I did also have to patch stats:::print.summary.ecdf() because that had relied on the fact that summary(<numeric>) did round itself already. Other useR's code may need similar changes... and so this *is* a user visible change, listed accordingly in NEWS (the above doc/NEWS.Rd in the sources). I hope very much that the overall and longer term benefit will vastly outweigh the nuisance (to people publishing, e.g.) that quite a few "basic" outputs will slightly change. The benefit for maintainers and old timers like me will be that we will not need to answer this (non-official) FAQ nor excuse a peculiar behavior in the future ..... But yes, I expect a flurry of questions starting in April 2017, and hope that the smart readers of this list will share the load answering them .. ;-) Martin Maechler ETH Zurich