David J. Birke
2019-Jun-20 00:24 UTC
[Rd] base::format adds extraneous whitespace for some inputs
Dear R Core Team, First of all, thank you for your amazing work on developing and maintaining this wonderful language. I just stumbled upon the following behavior in R version 3.6.0: format(9.91, digits = 2, nsmall = 2) format(9.99, digits = 2, nsmall = 2) yield "9.91" and " 9.99" with an extraneous whitespace. My expected output for the second command is "9.99". I have not found anything explaining the whitespace in the help files. Therefore, I am writing to report this behavior as a possible bug. Best wishes, David
Sarah Goslee
2019-Jun-20 13:56 UTC
[Rd] base::format adds extraneous whitespace for some inputs
I can reproduce this. It has to do with whether the value rounds down to 9 or up to 10, and thus needs another space, I think. I agree that it shouldn't happen, but at least you can get rid of the space by using trim = TRUE. # rounds to 9 vs 10 format(9.95, digits = 2) format(9.96, digits = 2) format(9.95, digits = 2, nsmall = 2) format(9.96, digits = 2, nsmall = 2) format(9.95, digits = 2, nsmall = 2, trim=TRUE) format(9.96, digits = 2, nsmall = 2, trim=TRUE) # rounds to 99 vs 100 format(99.94, digits = 3) format(99.95, digits = 3) format(99.94, digits = 3, nsmall = 2) format(99.95, digits = 3, nsmall = 2) format(99.94, digits = 3, nsmall = 2, trim=TRUE) format(99.95, digits = 3, nsmall = 2, trim=TRUE)> sessionInfo()R version 3.5.3 (2019-03-11) Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 28 (Workstation Edition) Matrix products: default BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] colorout_1.2-0 loaded via a namespace (and not attached): [1] compiler_3.5.3 tools_3.5.3> # rounds to 9 vs 10 > > format(9.95, digits = 2)[1] "9.9"> format(9.96, digits = 2)[1] "10"> > format(9.95, digits = 2, nsmall = 2)[1] "9.95"> format(9.96, digits = 2, nsmall = 2)[1] " 9.96"> > format(9.95, digits = 2, nsmall = 2, trim=TRUE)[1] "9.95"> format(9.96, digits = 2, nsmall = 2, trim=TRUE)[1] "9.96"> > # rounds to 99 vs 100 > > format(99.94, digits = 3)[1] "99.9"> format(99.95, digits = 3)[1] "100"> > format(99.94, digits = 3, nsmall = 2)[1] "99.94"> format(99.95, digits = 3, nsmall = 2)[1] " 99.95"> > format(99.94, digits = 3, nsmall = 2, trim=TRUE)[1] "99.94"> format(99.95, digits = 3, nsmall = 2, trim=TRUE)[1] "99.95" On Thu, Jun 20, 2019 at 3:19 AM David J. Birke <djbirke at berkeley.edu> wrote:> > Dear R Core Team, > > First of all, thank you for your amazing work on developing and > maintaining this wonderful language. > > I just stumbled upon the following behavior in R version 3.6.0: > > format(9.91, digits = 2, nsmall = 2) > format(9.99, digits = 2, nsmall = 2) > > yield "9.91" and " 9.99" with an extraneous whitespace. > > My expected output for the second command is "9.99". > > I have not found anything explaining the whitespace in the help files. > Therefore, I am writing to report this behavior as a possible bug. > > Best wishes, > David > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Sarah Goslee (she/her) http://www.numberwright.com
Martin Maechler
2019-Jun-20 15:27 UTC
[Rd] base::format adds extraneous whitespace for some inputs
>>>>> Sarah Goslee >>>>> on Thu, 20 Jun 2019 09:56:44 -0400 writes:> I can reproduce this. > It has to do with whether the value rounds down to 9 or up to 10, and > thus needs another space, I think. I agree that it shouldn't happen, > but at least you can get rid of the space by using trim = TRUE. > # rounds to 9 vs 10 > format(9.95, digits = 2) > format(9.96, digits = 2) > format(9.95, digits = 2, nsmall = 2) > format(9.96, digits = 2, nsmall = 2) > format(9.95, digits = 2, nsmall = 2, trim=TRUE) > format(9.96, digits = 2, nsmall = 2, trim=TRUE) > # rounds to 99 vs 100 > format(99.94, digits = 3) > format(99.95, digits = 3) > format(99.94, digits = 3, nsmall = 2) > format(99.95, digits = 3, nsmall = 2) > format(99.94, digits = 3, nsmall = 2, trim=TRUE) > format(99.95, digits = 3, nsmall = 2, trim=TRUE) Yes, indeed; I had wanted to reply earlier, but did not get to. I agree that this is bogous; I've never encountered it as I've (almost?) never used 'nsmall' consciously. Interestingly, this behavior has probably existed unchanged for close to R's full history. The 'nsmall = *' optional argument (of format.default() to be precise) was introduced in R 1.3.0 in 2001. And in my still working version of R 1.3.1, behavior seems similar (not identical) I think. You can access the underlying computations using format.info() from the R level. It calls into the C code which is really used here from the .Internal(format(...)) C code : e.g.> format.info(9.91, 2, 2)[1] 4 2 0 ==> result will use 4 characters> format.info(9.99, 2, 2)[1] 5 2 0 ==> result will use 5 characters ----------------- One more thing: format() has really been designed (in S, and inherited for R) to format *several* numbers, often matrices (or data frames if you must) to be printed and look nicely. For this (in cases like these, with numbers), format() must find a common format for all numbers, and that is the reason the underlying algorithm is quite sophisticated because it needs to cover many border line cases, notably deciding on when exponential format is needed, etc etc. For format()ting simple numbers (i.e. numeric vectors of length *one*), using formatC() (or even sprintf() is typically faster and easier to use--for sprintf() you need to know C-standard formatting a bit. >> sessionInfo() > R version 3.5.3 (2019-03-11) > Platform: x86_64-redhat-linux-gnu (64-bit) > Running under: Fedora 28 (Workstation Edition) .......... >> # rounds to 9 vs 10 >> >> format(9.95, digits = 2) > [1] "9.9" >> format(9.96, digits = 2) > [1] "10" >> >> format(9.95, digits = 2, nsmall = 2) > [1] "9.95" >> format(9.96, digits = 2, nsmall = 2) > [1] " 9.96" >> >> format(9.95, digits = 2, nsmall = 2, trim=TRUE) > [1] "9.95" >> format(9.96, digits = 2, nsmall = 2, trim=TRUE) > [1] "9.96" >> >> # rounds to 99 vs 100 >> >> format(99.94, digits = 3) > [1] "99.9" >> format(99.95, digits = 3) > [1] "100" >> >> format(99.94, digits = 3, nsmall = 2) > [1] "99.94" >> format(99.95, digits = 3, nsmall = 2) > [1] " 99.95" >> >> format(99.94, digits = 3, nsmall = 2, trim=TRUE) > [1] "99.94" >> format(99.95, digits = 3, nsmall = 2, trim=TRUE) > [1] "99.95" > On Thu, Jun 20, 2019 at 3:19 AM David J. Birke <djbirke at berkeley.edu> wrote: >> >> Dear R Core Team, >> >> First of all, thank you for your amazing work on developing and >> maintaining this wonderful language. >> >> I just stumbled upon the following behavior in R version 3.6.0: >> >> format(9.91, digits = 2, nsmall = 2) >> format(9.99, digits = 2, nsmall = 2) >> >> yield "9.91" and " 9.99" with an extraneous whitespace. >> >> My expected output for the second command is "9.99". >> >> I have not found anything explaining the whitespace in the help files. >> Therefore, I am writing to report this behavior as a possible bug. >> >> Best wishes, >> David >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > -- > Sarah Goslee (she/her) > http://www.numberwright.com > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel