John Mount
2016-Aug-19 15:04 UTC
[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors
I was wondering if it would make sense to change the default behavior of the following: summary(15555L) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 15560 15560 15560 15560 15560 15560 summary.default on numeric values rounds values (not just presentation) to getOption("digits")-3L (or four) digits by default, making those values surprising and less suitable for further calculation. Summary on matrix and data.frame do not do so. It seems it would be nice to have x=15555L; summary(x)[['Min.']] == min(x) evaluate to TRUE. I know one can alter behavior by changing the global ?digits? option, but I don?t know what other impacts that might have. Ideally I would think summary.default would not round its values at all, but use digits to control presentation (by overriding print and such). Even in presentation the rounding without switching to scientific notation (such as 1.556e+4) is a bit surprising (I understand rounding and scientific notation are two different presentation issues, but new users are very confused that something that appears to be an integer has been rounded). Example: summary(data.frame(x=15555)) ## x ## Min. :15555 ## 1st Qu.:15555 ## Median :15555 ## Mean :15555 ## 3rd Qu.:15555 ## Max. :15555 summary(15555) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 15560 15560 15560 15560 15560 15560 I have a (bit whiny) polemic trying to explain the pain point here http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/ <http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/> (I am not trying to be rude, more I am trying to emphasize why this can be confusing to new users). --------------- John Mount http://www.win-vector.com/ <http://www.win-vector.com/> Our book: Practical Data Science with R http://www.manning.com/zumel/ <http://www.manning.com/zumel/> [[alternative HTML version deleted]]
Jim Porzak
2016-Aug-19 15:23 UTC
[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors
Concur. I would argue the issue is more critical when sharing results (say summary() in a RMarkdown) with our business partners. On Fri, Aug 19, 2016 at 8:04 AM, John Mount <jmount at win-vector.com> wrote:> I was wondering if it would make sense to change the default behavior of > the following: > > summary(15555L) > ## Min. 1st Qu. Median Mean 3rd Qu. Max. > ## 15560 15560 15560 15560 15560 15560 > > summary.default on numeric values rounds values (not just presentation) to > getOption("digits")-3L (or four) digits by default, making those values > surprising and less suitable for further calculation. Summary on matrix > and data.frame do not do so. > > It seems it would be nice to have x=15555L; summary(x)[['Min.']] == min(x) > evaluate to TRUE. I know one can alter behavior by changing the global > ?digits? option, but I don?t know what other impacts that might have. > Ideally I would think summary.default would not round its values at all, > but use digits to control presentation (by overriding print and such). > Even in presentation the rounding without switching to scientific notation > (such as 1.556e+4) is a bit surprising (I understand rounding and > scientific notation are two different presentation issues, but new users > are very confused that something that appears to be an integer has been > rounded). > > Example: > > summary(data.frame(x=15555)) > ## x > ## Min. :15555 > ## 1st Qu.:15555 > ## Median :15555 > ## Mean :15555 > ## 3rd Qu.:15555 > ## Max. :15555 > summary(15555) > ## Min. 1st Qu. Median Mean 3rd Qu. Max. > ## 15560 15560 15560 15560 15560 15560 > > I have a (bit whiny) polemic trying to explain the pain point here > http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/ > <http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/> > (I am not trying to be rude, more I am trying to emphasize why this can be > confusing to new users). > > > > --------------- > John Mount > http://www.win-vector.com/ <http://www.win-vector.com/> > Our book: Practical Data Science with R http://www.manning.com/zumel/ < > http://www.manning.com/zumel/> > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Best, Jim Porzak DS4CI.org <http://www.ds4ci.org/> LinkedIn.com/in/JimPorzak <http://www.linkedin.com/in/jimporzak> use R! Group SF: meetup.com/R-Users/ <http://www.meetup.com/R-Users/> R Beginners, Berkeley: meetup.com/r-enthusiasts/ <http://www.meetup.com/r-enthusiasts/> [[alternative HTML version deleted]]
Simone Giannerini
2016-Aug-19 16:24 UTC
[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors
John, I had raised the matter ten years ago, and I was told that the topic was already very^3 old https://stat.ethz.ch/pipermail/r-devel/2006-September/042684.html there is some discussion on its origin and also a declaration of intents to change the default behaviour, which, unfortunately, remained a declaration. I agree that R could do better here, let's hope in less than ten years though. ;-) Kind regards, Simone On Fri, Aug 19, 2016 at 5:04 PM, John Mount <jmount at win-vector.com> wrote:> I was wondering if it would make sense to change the default behavior of > the following: > > summary(15555L) > ## Min. 1st Qu. Median Mean 3rd Qu. Max. > ## 15560 15560 15560 15560 15560 15560 > > summary.default on numeric values rounds values (not just presentation) to > getOption("digits")-3L (or four) digits by default, making those values > surprising and less suitable for further calculation. Summary on matrix > and data.frame do not do so. > > It seems it would be nice to have x=15555L; summary(x)[['Min.']] == min(x) > evaluate to TRUE. I know one can alter behavior by changing the global > ?digits? option, but I don?t know what other impacts that might have. > Ideally I would think summary.default would not round its values at all, > but use digits to control presentation (by overriding print and such). > Even in presentation the rounding without switching to scientific notation > (such as 1.556e+4) is a bit surprising (I understand rounding and > scientific notation are two different presentation issues, but new users > are very confused that something that appears to be an integer has been > rounded). > > Example: > > summary(data.frame(x=15555)) > ## x > ## Min. :15555 > ## 1st Qu.:15555 > ## Median :15555 > ## Mean :15555 > ## 3rd Qu.:15555 > ## Max. :15555 > summary(15555) > ## Min. 1st Qu. Median Mean 3rd Qu. Max. > ## 15560 15560 15560 15560 15560 15560 > > I have a (bit whiny) polemic trying to explain the pain point here > http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/ > <http://www.win-vector.com/blog/2016/08/my-criticism-of-r-numeric-summary/> > (I am not trying to be rude, more I am trying to emphasize why this can be > confusing to new users). > > > > --------------- > John Mount > http://www.win-vector.com/ <http://www.win-vector.com/> > Our book: Practical Data Science with R http://www.manning.com/zumel/ < > http://www.manning.com/zumel/> > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- ___________________________________________________ Simone Giannerini Dipartimento di Scienze Statistiche "Paolo Fortunati" Universita' di Bologna Via delle belle arti 41 - 40126 Bologna, ITALY Tel: +39 051 2098262 Fax: +39 051 232153 http://www2.stat.unibo.it/giannerini/ ___________________________________________________ [[alternative HTML version deleted]]
Dirk Eddelbuettel
2016-Aug-19 16:40 UTC
[Rd] summary.default rounding on numeric seems inconsistent with other R behaviors
It is the old story of defined behaviour and expected outcomes. Hard to change now. So I would suggest you do something like this in your ~/.Rprofile: R> smry <- function(...) summary(..., digits=6) R> smry(155555L) Min. 1st Qu. Median Mean 3rd Qu. Max. 155555 155555 155555 155555 155555 155555 R> Maybe call it Summary() instead. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Maybe Matching Threads
- summary.default rounding on numeric seems inconsistent with other R behaviors
- summary.default rounding on numeric seems inconsistent with other R behaviors
- summary.default rounding on numeric seems inconsistent with other R behaviors
- should base R have a piping operator ?
- quantile(), IQR() and median() for factors