Allowing underscores in numeric literals is becoming a very common feature in computing languages. All of these languages (and more) now support it python: https://peps.python.org/pep-0515/ javascript: https://v8.dev/features/numeric-separators julia: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers java: https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code. ruby: https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers perl: https://perldoc.perl.org/perldata#Scalar-value-constructors rust: https://doc.rust-lang.org/rust-by-example/primitives/literals.html C#: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals go: https://go.dev/ref/spec#Integer_literals Its use in this context also dates back to at least Ada 83 (http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal.) Many other communities see the benefit of this feature, I think R's community would benefit from it as well. On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at gmail.com> wrote:> > On Fri, 15 Jul 2022 11:25:32 -0400 > <avi.e.gross at gmail.com> wrote: > > > R normally delays evaluation so chunks of code are handed over > > untouched to functions that often play with the text directly without > > evaluating it until, perhaps, much later. > > Do they play with the text, or with the syntax tree after it went > through the parser? While it's true that R saves the source text of the > functions for ease of debugging, it's not guaranteed that a given > object will have source references, and typical NSE functions operate > on language objects which are tree-like structures containing R values, > not source text. > > You are, of course, right that any changes to the syntax of the > language must be carefully considered, but if anyone wants to play with > this idea, it can be implemented in a very simple manner: > > --- src/main/gram.y (revision 82598) > +++ src/main/gram.y (working copy) > @@ -2526,7 +2526,7 @@ > YYTEXT_PUSH(c, yyp); > /* We don't care about other than ASCII digits */ > while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E' > - || c == 'x' || c == 'X' || c == 'L') > + || c == 'x' || c == 'X' || c == 'L' || c == '_') > { > count++; > if (c == 'L') /* must be at the end. Won't allow 1Le3 (at present). */ > @@ -2533,6 +2533,9 @@ > { YYTEXT_PUSH(c, yyp); > break; > } > + if (c == '_') { /* allow an underscore anywhere inside the literal */ > + continue; > + } > > if (c == 'x' || c == 'X') { > if (count > 2 || last != '0') break; /* 0x must be first */ > > To an NSE function, the underscored literals are indistinguishable from > normal ones, because they don't see the literals: > > stopifnot(all.equal(\() 1000000, \() 1_000_000)) > f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y))) > f(1e6, 1_000_000) > > Although it's true that the source references change as a result: > > lapply( > list(\() 1000000, \() 1_000_000), > \(.) as.character(getSrcref(.)) > ) > # [[1]] > # [1] "\\() 1000000" > # > # [[2]] > # [1] "\\() 1_000_000" > > This patch is somewhat simplistic: it allows both multiple underscores > in succession and underscores at the end of the number literal. Perl > does so too, but with a warning: > > perl -wE'say "true" if 1__000_ == 1000' > # Misplaced _ in number at -e line 1. > # Misplaced _ in number at -e line 1. > # true > > -- > Best regards, > Ivan > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Duncan Murdoch
2022-Jul-15 18:26 UTC
[Rd] Feature Request: Allow Underscore Separated Numbers
Thanks for posting that list. The Python document is the only one I've read so far; it has a really nice summary (https://peps.python.org/pep-0515/#prior-art) of the differences in implementations among 10 languages. Which choice would you recommend, and why? - I think Ivan's quick solution doesn't quite match any of them. - C, Fortran and C++ have special support in R, but none of them use underscore separators. - C++ does support separators, but uses "'", not "_", and some ancient forms of Fortran ignore embedded spaces. Duncan Murdoch On 15/07/2022 1:58 p.m., Jim Hester wrote:> Allowing underscores in numeric literals is becoming a very common > feature in computing languages. All of these languages (and more) now > support it > > python: https://peps.python.org/pep-0515/ > javascript: https://v8.dev/features/numeric-separators > julia: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers > java: https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code. > ruby: https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors > rust: https://doc.rust-lang.org/rust-by-example/primitives/literals.html > C#: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals > go: https://go.dev/ref/spec#Integer_literals > > Its use in this context also dates back to at least Ada 83 > (http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal.) > > Many other communities see the benefit of this feature, I think R's > community would benefit from it as well. > > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at gmail.com> wrote: >> >> On Fri, 15 Jul 2022 11:25:32 -0400 >> <avi.e.gross at gmail.com> wrote: >> >>> R normally delays evaluation so chunks of code are handed over >>> untouched to functions that often play with the text directly without >>> evaluating it until, perhaps, much later. >> >> Do they play with the text, or with the syntax tree after it went >> through the parser? While it's true that R saves the source text of the >> functions for ease of debugging, it's not guaranteed that a given >> object will have source references, and typical NSE functions operate >> on language objects which are tree-like structures containing R values, >> not source text. >> >> You are, of course, right that any changes to the syntax of the >> language must be carefully considered, but if anyone wants to play with >> this idea, it can be implemented in a very simple manner: >> >> --- src/main/gram.y (revision 82598) >> +++ src/main/gram.y (working copy) >> @@ -2526,7 +2526,7 @@ >> YYTEXT_PUSH(c, yyp); >> /* We don't care about other than ASCII digits */ >> while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E' >> - || c == 'x' || c == 'X' || c == 'L') >> + || c == 'x' || c == 'X' || c == 'L' || c == '_') >> { >> count++; >> if (c == 'L') /* must be at the end. Won't allow 1Le3 (at present). */ >> @@ -2533,6 +2533,9 @@ >> { YYTEXT_PUSH(c, yyp); >> break; >> } >> + if (c == '_') { /* allow an underscore anywhere inside the literal */ >> + continue; >> + } >> >> if (c == 'x' || c == 'X') { >> if (count > 2 || last != '0') break; /* 0x must be first */ >> >> To an NSE function, the underscored literals are indistinguishable from >> normal ones, because they don't see the literals: >> >> stopifnot(all.equal(\() 1000000, \() 1_000_000)) >> f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y))) >> f(1e6, 1_000_000) >> >> Although it's true that the source references change as a result: >> >> lapply( >> list(\() 1000000, \() 1_000_000), >> \(.) as.character(getSrcref(.)) >> ) >> # [[1]] >> # [1] "\\() 1000000" >> # >> # [[2]] >> # [1] "\\() 1_000_000" >> >> This patch is somewhat simplistic: it allows both multiple underscores >> in succession and underscores at the end of the number literal. Perl >> does so too, but with a warning: >> >> perl -wE'say "true" if 1__000_ == 1000' >> # Misplaced _ in number at -e line 1. >> # Misplaced _ in number at -e line 1. >> # true >> >> -- >> Best regards, >> Ivan >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel