Sietse Brouwer
2017-May-18 20:50 UTC
[Rd] Bug: floating point bug in nclass.FD can cause hist() to crash
Hello everybody, This is a bug involving functions in core R package: graphics::hist.default, grDevices::nclass.FD, and base::pretty.default. It is not yet on Bugzilla. I cannot submit it myself, as I do not have an account. Could somebody else add it for me, perhaps? That would be much appreciated. Kind regards, Sietse Sietse Brouwer Summary ------- Floating point errors can cause a data vector to have an ultra-small inter-quartile range, which causes `grDevices::nclass.FD` to suggest an absurdly large number of breaks to `graphics::hist(breaks="FD")`. Because this large float becomes NA when converted to integer, hist's call to `base::pretty` crashes. How could nclass.FD, which has the job of suggesting a reasonable number of classes, avoid suggesting an absurdly large number of classes when the inter-quartile range is absurdly small compared to the range? Steps to reproduce ------------------ hist(c(1, 1, 1, 1 + 1e-15, 2), breaks="FD") Observed behaviour ------------------ Running this code gives the following error message: Error in pretty.default(range(x), n = breaks, min.n = 1): invalid 'n' argument In addition: Warning message: In pretty.default(range(x), n = breaks, min.n = 1) : NAs introduced by coercion to integer range Expected behaviour ------------------ That hist() should never crash when given valid numerical data. Specifically, that it should be robust even to those rare datasets where (through floating point inaccuracy) the inter-quartile range is tens of orders of magnitude smaller than the range. Analysis -------- Dramatis personae: * graphics::hist.default https://svn.r-project.org/R/trunk/src/library/graphics/R/hist.R * grDevices::nclass.FD https://svn.r-project.org/R/trunk/src/library/grDevices/R/calc.R * base::pretty.default https://svn.r-project.org/R/trunk/src/library/base/R/pretty.R `nclass.FD` examines the inter-quartile range of `x`, and gets a positive, but very small floating point value -- let's call it TINYFLOAT. It inserts this ultra-low IQR into the `nclass` denominator, which means `nclass` becoms a huge number -- let's call it BIGFLOAT. `nclass.FD` then returns this huge value to `hist`. Once `hist` has its 'number of breaks' suggestion, it feeds this number to `pretty`: pretty(range(x), BIGFLOAT, min.n = 1) `pretty`, in turn, calls .Internal(pretty(min(x), max(x), BIGFLOAT, min.n, shrink.sml, c(high.u.bias, u5.bias), eps.correct)) Which fails with the error and warning shown at start of this e-mail. (Invalid 'n' argument / NA's introduced by coercion to integer range.) My reading is that .Internal tried to coerce BIGFLOAT to integer range and produced an NA, and that (the C implementation of) `pretty`, in turn, choked when confronted with NA.
Spencer Graves
2017-May-18 21:05 UTC
[Rd] Bug: floating point bug in nclass.FD can cause hist() to crash
I just got the same error message with > sessionInfo() R version 3.4.0 (2017-04-21) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Sierra 10.12.4 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils [5] datasets methods base loaded via a namespace (and not attached): [1] compiler_3.4.0 tools_3.4.0 > On 2017-05-18 3:50 PM, Sietse Brouwer wrote:> Hello everybody, > > This is a bug involving functions in core R package: > graphics::hist.default, grDevices::nclass.FD, and > base::pretty.default. It is not yet on Bugzilla. I cannot submit it > myself, as I do not have an account. Could somebody else add it for > me, perhaps? That would be much appreciated. > > Kind regards, > > Sietse > Sietse Brouwer > > > Summary > ------- > > Floating point errors can cause a data vector to have an ultra-small > inter-quartile range, which causes `grDevices::nclass.FD` to suggest > an absurdly large number of breaks to `graphics::hist(breaks="FD")`. > Because this large float becomes NA when converted to integer, hist's > call to `base::pretty` crashes. > > How could nclass.FD, which has the job of suggesting a reasonable number of > classes, avoid suggesting an absurdly large number of classes when the > inter-quartile range is absurdly small compared to the range? > > > Steps to reproduce > ------------------ > > hist(c(1, 1, 1, 1 + 1e-15, 2), breaks="FD") > > > Observed behaviour > ------------------ > > Running this code gives the following error message: > > Error in pretty.default(range(x), n = breaks, min.n = 1): > invalid 'n' argument > In addition: Warning message: > In pretty.default(range(x), n = breaks, min.n = 1) : > NAs introduced by coercion to integer range > > > Expected behaviour > ------------------ > > That hist() should never crash when given valid numerical data. Specifically, > that it should be robust even to those rare datasets where (through floating > point inaccuracy) the inter-quartile range is tens of orders of magnitude > smaller than the range. > > > Analysis > -------- > > Dramatis personae: > > * graphics::hist.default > https://svn.r-project.org/R/trunk/src/library/graphics/R/hist.R > > * grDevices::nclass.FD > https://svn.r-project.org/R/trunk/src/library/grDevices/R/calc.R > > * base::pretty.default > https://svn.r-project.org/R/trunk/src/library/base/R/pretty.R > > `nclass.FD` examines the inter-quartile range of `x`, and gets a positive, but > very small floating point value -- let's call it TINYFLOAT. It inserts this > ultra-low IQR into the `nclass` denominator, which means `nclass` > becoms a huge number -- let's call it BIGFLOAT. `nclass.FD` then returns this > huge value to `hist`. > > Once `hist` has its 'number of breaks' suggestion, it feeds this > number to `pretty`: > > pretty(range(x), BIGFLOAT, min.n = 1) > > `pretty`, in turn, calls > > .Internal(pretty(min(x), max(x), BIGFLOAT, min.n, shrink.sml, > c(high.u.bias, u5.bias), eps.correct)) > > Which fails with the error and warning shown at start of this e-mail. (Invalid > 'n' argument / NA's introduced by coercion to integer range.) My reading is > that .Internal tried to coerce BIGFLOAT to integer range and produced an NA, > and that (the C implementation of) `pretty`, in turn, choked when confronted > with NA. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Sietse Brouwer
2017-May-20 16:20 UTC
[Rd] Bug: floating point bug in nclass.FD can cause hist() to crash
Hi, all, Sietse wrote:> Floating point errors can cause a data vector to have an ultra-small > inter-quartile range, which causes `grDevices::nclass.FD` to suggest > an absurdly large number of breaks to `graphics::hist(breaks="FD")`. > Because this large float becomes NA when converted to integer, hist's > call to `base::pretty` crashes.I have been provided with an account, and filed the bug at https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17274 Discussion continues there. Cheers, Sietse