thr3ads.net - R help - [R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Remo Röthlin

2021-Jun-19 13:58 UTC

[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector

Dear useRs

I?m encountering an unexpected behaviour when trying to apply format(x,
scientific = TRUE) on integer vectors (but not double vectors).
The resulting string is not formatted in scientific notation, however, using
formatC() instead, the result is as expected.

Is this the expected behaviour of format(x, scientific = TRUE)? I haven?t found
any information or discussion on a difference in scientific notation between
format and formatC.

Both functions are implemented as .Internal() functions in C, and while
do_formatC() uses C?s directly built-in capabilities to format, do_format() does
additional work.
Unfortunately my knowledge of R internals is not good enough to see why format()
treats integers differently in this case.

Warm regards,

Remo

SessionInfo and code to reproduce the issue with output (was also reproduced on
Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3):
> sessionInfo()R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
BLAS:  
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
LAPACK:
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.0> Sys.getlocale()[1]
"de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8"> 
> numvec <- c(-1.23e4, 1.23e4)
> typeof(numvec) # double
[1] "double"> 
> intvec <- c(-1.23e4L, 1.23e4L)
> typeof(intvec) # integer
[1] "integer"> 
> numvec2 <- as.double(intvec)
> identical(numvec, numvec2)
[1] TRUE> 
> formatC(numvec, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04" > format(numvec, scientific = TRUE) # Formatted as scientific notation
[1] "-1.23e+04" " 1.23e+04"> 
> formatC(intvec, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04" > format(intvec, scientific = TRUE) # *Not* formatted as scientific notation
[1] "-12300" " 12300"> 
> formatC(numvec2, format = "e") # Formatted as scientific notation
[1] "-1.2300e+04" "1.2300e+04" > format(numvec2, scientific = TRUE) # Formatted as scientific notation[1] "-1.23e+04" " 1.23e+04"

Bert Gunter

2021-Jun-19 19:07 UTC

head link

[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector

The behavior is **as documented on the man page**, always the first thing
you should consult. It can be terse, but almost always accurate once
deciphered.
In this case, ?format says for the "scientific" parameter(Highlighting
added):

scientific
Either a logical specifying whether elements of a **real or complex
vector** should be encoded in scientific format, or an integer penalty (see
options("scipen")). Missing values correspond to the current default
penalty.

Your vector is integer, right? The options("scipen") man page also
indicates that fixed format will be used.


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Jun 19, 2021 at 9:50 AM Remo R?thlin <avelarius at gmail.com>
wrote:
> Dear useRs
>
> I?m encountering an unexpected behaviour when trying to apply format(x,
> scientific = TRUE) on integer vectors (but not double vectors).
> The resulting string is not formatted in scientific notation, however,
> using formatC() instead, the result is as expected.
>
> Is this the expected behaviour of format(x, scientific = TRUE)? I haven?t
> found any information or discussion on a difference in scientific notation
> between format and formatC.
>
> Both functions are implemented as .Internal() functions in C, and while
> do_formatC() uses C?s directly built-in capabilities to format, do_format()
> does additional work.
> Unfortunately my knowledge of R internals is not good enough to see why
> format() treats integers differently in this case.
>
> Warm regards,
>
> Remo
>
> SessionInfo and code to reproduce the issue with output (was also
> reproduced on Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R
4.0.3):
>
> > sessionInfo()
> R version 4.1.0 (2021-05-18)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Big Sur 10.16
>
> Matrix products: default
> BLAS:
>  /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
>
> locale:
> [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.0
> > Sys.getlocale()
> [1]
"de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8"
> >
> > numvec <- c(-1.23e4, 1.23e4)
> > typeof(numvec) # double
> [1] "double"
> >
> > intvec <- c(-1.23e4L, 1.23e4L)
> > typeof(intvec) # integer
> [1] "integer"
> >
> > numvec2 <- as.double(intvec)
> > identical(numvec, numvec2)
> [1] TRUE
> >
> > formatC(numvec, format = "e") # Formatted as scientific
notation
> [1] "-1.2300e+04" "1.2300e+04"
> > format(numvec, scientific = TRUE) # Formatted as scientific notation
> [1] "-1.23e+04" " 1.23e+04"
> >
> > formatC(intvec, format = "e") # Formatted as scientific
notation
> [1] "-1.2300e+04" "1.2300e+04"
> > format(intvec, scientific = TRUE) # *Not* formatted as scientific
> notation
> [1] "-12300" " 12300"
> >
> > formatC(numvec2, format = "e") # Formatted as scientific
notation
> [1] "-1.2300e+04" "1.2300e+04"
> > format(numvec2, scientific = TRUE) # Formatted as scientific notation
> [1] "-1.23e+04" " 1.23e+04"
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Duncan Murdoch

2021-Jun-19 23:28 UTC

head link

[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector

On 19/06/2021 9:58 a.m., Remo R?thlin wrote:> Dear useRs
> 
> I?m encountering an unexpected behaviour when trying to apply format(x,
scientific = TRUE) on integer vectors (but not double vectors).
> The resulting string is not formatted in scientific notation, however,
using formatC() instead, the result is as expected.
> 
> Is this the expected behaviour of format(x, scientific = TRUE)? I haven?t
found any information or discussion on a difference in scientific notation
between format and formatC.
If you look at the internals of  the format.default() function, you'll 
see that it ignores the "scientific" argument when the type of the 
argument is integer:

https://github.com/wch/r-source/blob/23dc578c6f40acdf53f92bab88cf91ecd25cd2e8/src/main/paste.c#L543-L552

The help page describes that argument as:

`Either a logical specifying whether elements of a real or complex 
vector should be encoded in scientific format, or an integer penalty 
(see options("scipen")). Missing values correspond to the current 
default penalty.`

so there's no reason to expect it applies to integer vectors as well.

I suspect the reason for this goes back to S, which was influenced more 
by Fortran than by C:  and I think Fortran (at least as it was in the 
70s and 80s) never used scientific notation on integers.

Duncan Murdoch
> Both functions are implemented as .Internal() functions in C, and while
do_formatC() uses C?s directly built-in capabilities to format, do_format() does
additional work.
> Unfortunately my knowledge of R internals is not good enough to see why
format() treats integers differently in this case.
> 
> Warm regards,
> 
> Remo
> 
> SessionInfo and code to reproduce the issue with output (was also
reproduced on Windows 10 x64 R 4.1.0 and RStudio Cloud R 3.6.3 & R 4.0.3):
> 
>> sessionInfo()
> R version 4.1.0 (2021-05-18)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Big Sur 10.16
> 
> Matrix products: default
> BLAS:  
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
> LAPACK:
/Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
> 
> locale:
> [1] de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> loaded via a namespace (and not attached):
> [1] compiler_4.1.0
>> Sys.getlocale()
> [1]
"de_CH.UTF-8/de_CH.UTF-8/de_CH.UTF-8/C/de_CH.UTF-8/de_CH.UTF-8"
>>
>> numvec <- c(-1.23e4, 1.23e4)
>> typeof(numvec) # double
> [1] "double"
>>
>> intvec <- c(-1.23e4L, 1.23e4L)
>> typeof(intvec) # integer
> [1] "integer"
>>
>> numvec2 <- as.double(intvec)
>> identical(numvec, numvec2)
> [1] TRUE
>>
>> formatC(numvec, format = "e") # Formatted as scientific
notation
> [1] "-1.2300e+04" "1.2300e+04"
>> format(numvec, scientific = TRUE) # Formatted as scientific notation
> [1] "-1.23e+04" " 1.23e+04"
>>
>> formatC(intvec, format = "e") # Formatted as scientific
notation
> [1] "-1.2300e+04" "1.2300e+04"
>> format(intvec, scientific = TRUE) # *Not* formatted as scientific
notation
> [1] "-12300" " 12300"
>>
>> formatC(numvec2, format = "e") # Formatted as scientific
notation
> [1] "-1.2300e+04" "1.2300e+04"
>> format(numvec2, scientific = TRUE) # Formatted as scientific notation
> [1] "-1.23e+04" " 1.23e+04"
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Jun 2021 - Unexpected behaviour when using format(x, scientific = TRUE) on integer vector

[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector

[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector

[R] Unexpected behaviour when using format(x, scientific = TRUE) on integer vector