Suharto Anggono Suharto Anggono
2013-Oct-02 07:49 UTC
[Rd] For numeric x, as.character(x) doesn't always match signif(x, 15)
I saw something like this.> x <- 5180000000000003 > print(x, digits=20)[1] 5180000000000003> as.character(x)[1] "5.18e+15" I thought it was because, when x is numeric, as.character(x) represents x rounded to 15 significant digits.> print(signif(x, 15), digits=20)[1] 5180000000000000.0000> as.numeric(as.character(x)) == signif(x, 15)[1] TRUE The documentation for 'as.character' in R states this in "Details" section. 'as.character' represents real and complex numbers to 15 significant digits (technically the compiler's setting of the ISO C constant 'DBL_DIG', which will be 15 on machines supporting IEC60559 arithmetic according to the C99 standard). This ensures that all the digits in the result will be reliable (and not the result of representation error), but does mean that conversion to character and back to numeric may change the number. If you want to convert numbers to character with the maximum possible precision, use 'format'. But then, I was surprised when I also saw this, where as.character(x) didn't match signif(x, 15).> x <- 1234567890123456 > print(x, digits=20)[1] 1234567890123456> as.character(x)[1] "1234567890123456"> print(signif(x, 15), digits=20)[1] 1234567890123460> as.numeric(as.character(x)) == signif(x, 15)[1] FALSE Then, I found another example of this behavior in https://stat.ethz.ch/pipermail/r-devel/2009-May/053341.html. It seems that, for numeric, the result of 'as.character' equals format(., digits=15) applied to each element individually. Is it always the case?> format(5180000000000003, digits=15)[1] "5.18e+15"> format(1234567890123456, digits=15)[1] "1234567890123456" I assume that format(x, digits=15) behaves like print(x, digits=15).> print(5180000000000003, digits=15)[1] 5.18e+15> print(1234567890123456, digits=15)[1] 1234567890123456 The result of print(1234567890123456, digits=15) violates the part "at least one entry will be encoded with that minimum number" in "Details" section in the documentation for 'print.default'. The same number of decimal places is used throughout a vector. This means that 'digits' specifies the minimum number of significant digits to be used, and that at least one entry will be encoded with that minimum number.> sessionInfo()R version 3.0.2 (2013-09-25) Platform: i386-w64-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.2