I think keeping it simple and less restrictive is the best approach, for ease of implementation, limiting future maintenance, and so users have the flexibility to format these however they wish. So I would probably lean towards allowing multiple delimiters anywhere (including trailing) or possibly just between digits. On Fri, Jul 15, 2022 at 2:26 PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> > Thanks for posting that list. The Python document is the only one I've > read so far; it has a really nice summary > (https://peps.python.org/pep-0515/#prior-art) of the differences in > implementations among 10 languages. Which choice would you recommend, > and why? > > - I think Ivan's quick solution doesn't quite match any of them. > - C, Fortran and C++ have special support in R, but none of them use > underscore separators. > - C++ does support separators, but uses "'", not "_", and some ancient > forms of Fortran ignore embedded spaces. > > Duncan Murdoch > > On 15/07/2022 1:58 p.m., Jim Hester wrote: > > Allowing underscores in numeric literals is becoming a very common > > feature in computing languages. All of these languages (and more) now > > support it > > > > python: https://peps.python.org/pep-0515/ > > javascript: https://v8.dev/features/numeric-separators > > julia: https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers > > java: https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code. > > ruby: https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers > > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors > > rust: https://doc.rust-lang.org/rust-by-example/primitives/literals.html > > C#: https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals > > go: https://go.dev/ref/spec#Integer_literals > > > > Its use in this context also dates back to at least Ada 83 > > (http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal.) > > > > Many other communities see the benefit of this feature, I think R's > > community would benefit from it as well. > > > > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at gmail.com> wrote: > >> > >> On Fri, 15 Jul 2022 11:25:32 -0400 > >> <avi.e.gross at gmail.com> wrote: > >> > >>> R normally delays evaluation so chunks of code are handed over > >>> untouched to functions that often play with the text directly without > >>> evaluating it until, perhaps, much later. > >> > >> Do they play with the text, or with the syntax tree after it went > >> through the parser? While it's true that R saves the source text of the > >> functions for ease of debugging, it's not guaranteed that a given > >> object will have source references, and typical NSE functions operate > >> on language objects which are tree-like structures containing R values, > >> not source text. > >> > >> You are, of course, right that any changes to the syntax of the > >> language must be carefully considered, but if anyone wants to play with > >> this idea, it can be implemented in a very simple manner: > >> > >> --- src/main/gram.y (revision 82598) > >> +++ src/main/gram.y (working copy) > >> @@ -2526,7 +2526,7 @@ > >> YYTEXT_PUSH(c, yyp); > >> /* We don't care about other than ASCII digits */ > >> while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E' > >> - || c == 'x' || c == 'X' || c == 'L') > >> + || c == 'x' || c == 'X' || c == 'L' || c == '_') > >> { > >> count++; > >> if (c == 'L') /* must be at the end. Won't allow 1Le3 (at present). */ > >> @@ -2533,6 +2533,9 @@ > >> { YYTEXT_PUSH(c, yyp); > >> break; > >> } > >> + if (c == '_') { /* allow an underscore anywhere inside the literal */ > >> + continue; > >> + } > >> > >> if (c == 'x' || c == 'X') { > >> if (count > 2 || last != '0') break; /* 0x must be first */ > >> > >> To an NSE function, the underscored literals are indistinguishable from > >> normal ones, because they don't see the literals: > >> > >> stopifnot(all.equal(\() 1000000, \() 1_000_000)) > >> f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y))) > >> f(1e6, 1_000_000) > >> > >> Although it's true that the source references change as a result: > >> > >> lapply( > >> list(\() 1000000, \() 1_000_000), > >> \(.) as.character(getSrcref(.)) > >> ) > >> # [[1]] > >> # [1] "\\() 1000000" > >> # > >> # [[2]] > >> # [1] "\\() 1_000_000" > >> > >> This patch is somewhat simplistic: it allows both multiple underscores > >> in succession and underscores at the end of the number literal. Perl > >> does so too, but with a warning: > >> > >> perl -wE'say "true" if 1__000_ == 1000' > >> # Misplaced _ in number at -e line 1. > >> # Misplaced _ in number at -e line 1. > >> # true > >> > >> -- > >> Best regards, > >> Ivan > >> > >> ______________________________________________ > >> R-devel at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel >
Bill Dunlap
2022-Jul-15 19:34 UTC
[Rd] Feature Request: Allow Underscore Separated Numbers
The token '._1' (period underscore digit) is currently parsed as a symbol (name). It would become a number if underscore were ignored as in the first proposal. The just-between-digits alternative would avoid this change. -Bill On Fri, Jul 15, 2022 at 12:26 PM Jim Hester <james.f.hester at gmail.com> wrote:> I think keeping it simple and less restrictive is the best approach, > for ease of implementation, limiting future maintenance, and so users > have the flexibility to format these however they wish. So I would > probably lean towards allowing multiple delimiters anywhere (including > trailing) or possibly just between digits. > > On Fri, Jul 15, 2022 at 2:26 PM Duncan Murdoch <murdoch.duncan at gmail.com> > wrote: > > > > Thanks for posting that list. The Python document is the only one I've > > read so far; it has a really nice summary > > (https://peps.python.org/pep-0515/#prior-art) of the differences in > > implementations among 10 languages. Which choice would you recommend, > > and why? > > > > - I think Ivan's quick solution doesn't quite match any of them. > > - C, Fortran and C++ have special support in R, but none of them use > > underscore separators. > > - C++ does support separators, but uses "'", not "_", and some ancient > > forms of Fortran ignore embedded spaces. > > > > Duncan Murdoch > > > > On 15/07/2022 1:58 p.m., Jim Hester wrote: > > > Allowing underscores in numeric literals is becoming a very common > > > feature in computing languages. All of these languages (and more) now > > > support it > > > > > > python: https://peps.python.org/pep-0515/ > > > javascript: https://v8.dev/features/numeric-separators > > > julia: > https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers > > > java: > https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code > . > > > ruby: > https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers > > > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors > > > rust: > https://doc.rust-lang.org/rust-by-example/primitives/literals.html > > > C#: > https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals > > > go: https://go.dev/ref/spec#Integer_literals > > > > > > Its use in this context also dates back to at least Ada 83 > > > ( > http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal > .) > > > > > > Many other communities see the benefit of this feature, I think R's > > > community would benefit from it as well. > > > > > > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at gmail.com> > wrote: > > >> > > >> On Fri, 15 Jul 2022 11:25:32 -0400 > > >> <avi.e.gross at gmail.com> wrote: > > >> > > >>> R normally delays evaluation so chunks of code are handed over > > >>> untouched to functions that often play with the text directly without > > >>> evaluating it until, perhaps, much later. > > >> > > >> Do they play with the text, or with the syntax tree after it went > > >> through the parser? While it's true that R saves the source text of > the > > >> functions for ease of debugging, it's not guaranteed that a given > > >> object will have source references, and typical NSE functions operate > > >> on language objects which are tree-like structures containing R > values, > > >> not source text. > > >> > > >> You are, of course, right that any changes to the syntax of the > > >> language must be carefully considered, but if anyone wants to play > with > > >> this idea, it can be implemented in a very simple manner: > > >> > > >> --- src/main/gram.y (revision 82598) > > >> +++ src/main/gram.y (working copy) > > >> @@ -2526,7 +2526,7 @@ > > >> YYTEXT_PUSH(c, yyp); > > >> /* We don't care about other than ASCII digits */ > > >> while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c == 'E' > > >> - || c == 'x' || c == 'X' || c == 'L') > > >> + || c == 'x' || c == 'X' || c == 'L' || c == '_') > > >> { > > >> count++; > > >> if (c == 'L') /* must be at the end. Won't allow 1Le3 (at > present). */ > > >> @@ -2533,6 +2533,9 @@ > > >> { YYTEXT_PUSH(c, yyp); > > >> break; > > >> } > > >> + if (c == '_') { /* allow an underscore anywhere inside the > literal */ > > >> + continue; > > >> + } > > >> > > >> if (c == 'x' || c == 'X') { > > >> if (count > 2 || last != '0') break; /* 0x must be > first */ > > >> > > >> To an NSE function, the underscored literals are indistinguishable > from > > >> normal ones, because they don't see the literals: > > >> > > >> stopifnot(all.equal(\() 1000000, \() 1_000_000)) > > >> f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y))) > > >> f(1e6, 1_000_000) > > >> > > >> Although it's true that the source references change as a result: > > >> > > >> lapply( > > >> list(\() 1000000, \() 1_000_000), > > >> \(.) as.character(getSrcref(.)) > > >> ) > > >> # [[1]] > > >> # [1] "\\() 1000000" > > >> # > > >> # [[2]] > > >> # [1] "\\() 1_000_000" > > >> > > >> This patch is somewhat simplistic: it allows both multiple underscores > > >> in succession and underscores at the end of the number literal. Perl > > >> does so too, but with a warning: > > >> > > >> perl -wE'say "true" if 1__000_ == 1000' > > >> # Misplaced _ in number at -e line 1. > > >> # Misplaced _ in number at -e line 1. > > >> # true > > >> > > >> -- > > >> Best regards, > > >> Ivan > > >> > > >> ______________________________________________ > > >> R-devel at r-project.org mailing list > > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > ______________________________________________ > > > R-devel at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]