Remo Röthlin
2021-Jun-19 13:58 UTC
[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector
Dear useRs I?m encountering an unexpected behaviour when trying to apply format(x, scientific = TRUE) on integer vectors (but not double vectors). The resulting string is not formatted in scientific notation, however, using formatC() instead, the result is as expected. Is this the expected behaviour of format(x, scientific = TRUE)? I haven?t found any information or discussion on a difference in scientific notation between format and formatC. Both functions are implemented as .Internal() functions in C, and while do_formatC() uses C?s directly built-in capabilities to format, do_format() does additional work. Unfortunately my knowledge of R internals is not good enough to see why format() treats integers differently in this case. Warm regards, Remo SessionInfo and code to reproduce the issue with output (was also reproduced on Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3):> sessionInfo()R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib locale: [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.1.0> Sys.getlocale()[1] "de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8"> > numvec <- c(-1.23e4, 1.23e4) > typeof(numvec) # double[1] "double"> > intvec <- c(-1.23e4L, 1.23e4L) > typeof(intvec) # integer[1] "integer"> > numvec2 <- as.double(intvec) > identical(numvec, numvec2)[1] TRUE> > formatC(numvec, format = "e") # Formatted as scientific notation[1] "-1.2300e+04" "1.2300e+04"> format(numvec, scientific = TRUE) # Formatted as scientific notation[1] "-1.23e+04" " 1.23e+04"> > formatC(intvec, format = "e") # Formatted as scientific notation[1] "-1.2300e+04" "1.2300e+04"> format(intvec, scientific = TRUE) # *Not* formatted as scientific notation[1] "-12300" " 12300"> > formatC(numvec2, format = "e") # Formatted as scientific notation[1] "-1.2300e+04" "1.2300e+04"> format(numvec2, scientific = TRUE) # Formatted as scientific notation[1] "-1.23e+04" " 1.23e+04"
Bert Gunter
2021-Jun-19 19:07 UTC
[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector
The behavior is **as documented on the man page**, always the first thing you should consult. It can be terse, but almost always accurate once deciphered. In this case, ?format says for the "scientific" parameter(Highlighting added): scientific Either a logical specifying whether elements of a **real or complex vector** should be encoded in scientific format, or an integer penalty (see options("scipen")). Missing values correspond to the current default penalty. Your vector is integer, right? The options("scipen") man page also indicates that fixed format will be used. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Jun 19, 2021 at 9:50 AM Remo R?thlin <avelarius at gmail.com> wrote:> Dear useRs > > I?m encountering an unexpected behaviour when trying to apply format(x, > scientific = TRUE) on integer vectors (but not double vectors). > The resulting string is not formatted in scientific notation, however, > using formatC() instead, the result is as expected. > > Is this the expected behaviour of format(x, scientific = TRUE)? I haven?t > found any information or discussion on a difference in scientific notation > between format and formatC. > > Both functions are implemented as .Internal() functions in C, and while > do_formatC() uses C?s directly built-in capabilities to format, do_format() > does additional work. > Unfortunately my knowledge of R internals is not good enough to see why > format() treats integers differently in this case. > > Warm regards, > > Remo > > SessionInfo and code to reproduce the issue with output (was also > reproduced on Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3): > > > sessionInfo() > R version 4.1.0 (2021-05-18) > Platform: x86_64-apple-darwin17.0 (64-bit) > Running under: macOS Big Sur 10.16 > > Matrix products: default > BLAS: > /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib > LAPACK: > /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib > > locale: > [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_4.1.0 > > Sys.getlocale() > [1] "de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8" > > > > numvec <- c(-1.23e4, 1.23e4) > > typeof(numvec) # double > [1] "double" > > > > intvec <- c(-1.23e4L, 1.23e4L) > > typeof(intvec) # integer > [1] "integer" > > > > numvec2 <- as.double(intvec) > > identical(numvec, numvec2) > [1] TRUE > > > > formatC(numvec, format = "e") # Formatted as scientific notation > [1] "-1.2300e+04" "1.2300e+04" > > format(numvec, scientific = TRUE) # Formatted as scientific notation > [1] "-1.23e+04" " 1.23e+04" > > > > formatC(intvec, format = "e") # Formatted as scientific notation > [1] "-1.2300e+04" "1.2300e+04" > > format(intvec, scientific = TRUE) # *Not* formatted as scientific > notation > [1] "-12300" " 12300" > > > > formatC(numvec2, format = "e") # Formatted as scientific notation > [1] "-1.2300e+04" "1.2300e+04" > > format(numvec2, scientific = TRUE) # Formatted as scientific notation > [1] "-1.23e+04" " 1.23e+04" > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Duncan Murdoch
2021-Jun-19 23:28 UTC
[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector
On 19/06/2021 9:58 a.m., Remo R?thlin wrote:> Dear useRs > > I?m encountering an unexpected behaviour when trying to apply format(x, scientific = TRUE) on integer vectors (but not double vectors). > The resulting string is not formatted in scientific notation, however, using formatC() instead, the result is as expected. > > Is this the expected behaviour of format(x, scientific = TRUE)? I haven?t found any information or discussion on a difference in scientific notation between format and formatC.If you look at the internals of the format.default() function, you'll see that it ignores the "scientific" argument when the type of the argument is integer: https://github.com/wch/r-source/blob/23dc578c6f40acdf53f92bab88cf91ecd25cd2e8/src/main/paste.c#L543-L552 The help page describes that argument as: `Either a logical specifying whether elements of a real or complex vector should be encoded in scientific format, or an integer penalty (see options("scipen")). Missing values correspond to the current default penalty.` so there's no reason to expect it applies to integer vectors as well. I suspect the reason for this goes back to S, which was influenced more by Fortran than by C: and I think Fortran (at least as it was in the 70s and 80s) never used scientific notation on integers. Duncan Murdoch> Both functions are implemented as .Internal() functions in C, and while do_formatC() uses C?s directly built-in capabilities to format, do_format() does additional work. > Unfortunately my knowledge of R internals is not good enough to see why format() treats integers differently in this case. > > Warm regards, > > Remo > > SessionInfo and code to reproduce the issue with output (was also reproduced on Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3): > >> sessionInfo() > R version 4.1.0 (2021-05-18) > Platform: x86_64-apple-darwin17.0 (64-bit) > Running under: macOS Big Sur 10.16 > > Matrix products: default > BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib > LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib > > locale: > [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_4.1.0 >> Sys.getlocale() > [1] "de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8" >> >> numvec <- c(-1.23e4, 1.23e4) >> typeof(numvec) # double > [1] "double" >> >> intvec <- c(-1.23e4L, 1.23e4L) >> typeof(intvec) # integer > [1] "integer" >> >> numvec2 <- as.double(intvec) >> identical(numvec, numvec2) > [1] TRUE >> >> formatC(numvec, format = "e") # Formatted as scientific notation > [1] "-1.2300e+04" "1.2300e+04" >> format(numvec, scientific = TRUE) # Formatted as scientific notation > [1] "-1.23e+04" " 1.23e+04" >> >> formatC(intvec, format = "e") # Formatted as scientific notation > [1] "-1.2300e+04" "1.2300e+04" >> format(intvec, scientific = TRUE) # *Not* formatted as scientific notation > [1] "-12300" " 12300" >> >> formatC(numvec2, format = "e") # Formatted as scientific notation > [1] "-1.2300e+04" "1.2300e+04" >> format(numvec2, scientific = TRUE) # Formatted as scientific notation > [1] "-1.23e+04" " 1.23e+04" > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >