thr3ads.net - R devel - [Rd] Feature Request: Allow Underscore Separated Numbers [Jul 2022]

If this information is useful, please help other people find it:
Share via:

Jim Hester

2022-Jul-15 19:25 UTC

[Rd] Feature Request: Allow Underscore Separated Numbers

I think keeping it simple and less restrictive is the best approach,
for ease of implementation, limiting future maintenance, and so users
have the flexibility to format these however they wish. So I would
probably lean towards allowing multiple delimiters anywhere (including
trailing) or possibly just between digits.

On Fri, Jul 15, 2022 at 2:26 PM Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:>
> Thanks for posting that list.  The Python document is the only one I've
> read so far; it has a really nice summary
> (https://peps.python.org/pep-0515/#prior-art) of the differences in
> implementations among 10 languages.  Which choice would you recommend,
> and why?
>
>   - I think Ivan's quick solution doesn't quite match any of them.
>   - C, Fortran and C++ have special support in R, but none of them use
> underscore separators.
>   - C++ does support separators, but uses "'", not
"_", and some ancient
> forms of Fortran ignore embedded spaces.
>
> Duncan Murdoch
>
> On 15/07/2022 1:58 p.m., Jim Hester wrote:
> > Allowing underscores in numeric literals is becoming a very common
> > feature in computing languages. All of these languages (and more) now
> > support it
> >
> > python: https://peps.python.org/pep-0515/
> > javascript: https://v8.dev/features/numeric-separators
> > julia:
https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers
> > java:
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code.
> > ruby:
https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers
> > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors
> > rust:
https://doc.rust-lang.org/rust-by-example/primitives/literals.html
> > C#:
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals
> > go: https://go.dev/ref/spec#Integer_literals
> >
> > Its use in this context also dates back to at least Ada 83
> >
(http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal.)
> >
> > Many other communities see the benefit of this feature, I think
R's
> > community would benefit from it as well.
> >
> > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at
gmail.com> wrote:
> >>
> >> On Fri, 15 Jul 2022 11:25:32 -0400
> >> <avi.e.gross at gmail.com> wrote:
> >>
> >>> R normally delays evaluation so chunks of code are handed over
> >>> untouched to functions that often play with the text directly
without
> >>> evaluating it until, perhaps, much later.
> >>
> >> Do they play with the text, or with the syntax tree after it went
> >> through the parser? While it's true that R saves the source
text of the
> >> functions for ease of debugging, it's not guaranteed that a
given
> >> object will have source references, and typical NSE functions
operate
> >> on language objects which are tree-like structures containing R
values,
> >> not source text.
> >>
> >> You are, of course, right that any changes to the syntax of the
> >> language must be carefully considered, but if anyone wants to play
with
> >> this idea, it can be implemented in a very simple manner:
> >>
> >> --- src/main/gram.y     (revision 82598)
> >> +++ src/main/gram.y     (working copy)
> >> @@ -2526,7 +2526,7 @@
> >>       YYTEXT_PUSH(c, yyp);
> >>       /* We don't care about other than ASCII digits */
> >>       while (isdigit(c = xxgetc()) || c == '.' || c ==
'e' || c == 'E'
> >> -          || c == 'x' || c == 'X' || c ==
'L')
> >> +          || c == 'x' || c == 'X' || c ==
'L' || c == '_')
> >>       {
> >>          count++;
> >>          if (c == 'L') /* must be at the end.  Won't
allow 1Le3 (at present). */
> >> @@ -2533,6 +2533,9 @@
> >>          {   YYTEXT_PUSH(c, yyp);
> >>              break;
> >>          }
> >> +       if (c == '_') { /* allow an underscore anywhere
inside the literal */
> >> +           continue;
> >> +       }
> >>
> >>          if (c == 'x' || c == 'X') {
> >>              if (count > 2 || last != '0') break;  /*
0x must be first */
> >>
> >> To an NSE function, the underscored literals are indistinguishable
from
> >> normal ones, because they don't see the literals:
> >>
> >> stopifnot(all.equal(\() 1000000, \() 1_000_000))
> >> f <- function(x, y) stopifnot(all.equal(substitute(x),
substitute(y)))
> >> f(1e6, 1_000_000)
> >>
> >> Although it's true that the source references change as a
result:
> >>
> >> lapply(
> >>   list(\() 1000000, \() 1_000_000),
> >>   \(.) as.character(getSrcref(.))
> >> )
> >> # [[1]]
> >> # [1] "\\() 1000000"
> >> #
> >> # [[2]]
> >> # [1] "\\() 1_000_000"
> >>
> >> This patch is somewhat simplistic: it allows both multiple
underscores
> >> in succession and underscores at the end of the number literal.
Perl
> >> does so too, but with a warning:
> >>
> >> perl -wE'say "true" if 1__000_ == 1000'
> >> # Misplaced _ in number at -e line 1.
> >> # Misplaced _ in number at -e line 1.
> >> # true
> >>
> >> --
> >> Best regards,
> >> Ivan
> >>
> >> ______________________________________________
> >> R-devel at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>

Bill Dunlap

2022-Jul-15 19:34 UTC

head link

[Rd] Feature Request: Allow Underscore Separated Numbers

The token '._1' (period underscore digit) is currently parsed as a
symbol
(name).  It would become a number if underscore were ignored as in the
first proposal.  The just-between-digits alternative would avoid this
change.

-Bill

On Fri, Jul 15, 2022 at 12:26 PM Jim Hester <james.f.hester at gmail.com>
wrote:
> I think keeping it simple and less restrictive is the best approach,
> for ease of implementation, limiting future maintenance, and so users
> have the flexibility to format these however they wish. So I would
> probably lean towards allowing multiple delimiters anywhere (including
> trailing) or possibly just between digits.
>
> On Fri, Jul 15, 2022 at 2:26 PM Duncan Murdoch <murdoch.duncan at
gmail.com>
> wrote:
> >
> > Thanks for posting that list.  The Python document is the only one
I've
> > read so far; it has a really nice summary
> > (https://peps.python.org/pep-0515/#prior-art) of the differences in
> > implementations among 10 languages.  Which choice would you recommend,
> > and why?
> >
> >   - I think Ivan's quick solution doesn't quite match any of
them.
> >   - C, Fortran and C++ have special support in R, but none of them use
> > underscore separators.
> >   - C++ does support separators, but uses "'", not
"_", and some ancient
> > forms of Fortran ignore embedded spaces.
> >
> > Duncan Murdoch
> >
> > On 15/07/2022 1:58 p.m., Jim Hester wrote:
> > > Allowing underscores in numeric literals is becoming a very
common
> > > feature in computing languages. All of these languages (and more)
now
> > > support it
> > >
> > > python: https://peps.python.org/pep-0515/
> > > javascript: https://v8.dev/features/numeric-separators
> > > julia:
>
https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers
> > > java:
>
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code
> .
> > > ruby:
> https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers
> > > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors
> > > rust:
> https://doc.rust-lang.org/rust-by-example/primitives/literals.html
> > > C#:
>
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals
> > > go: https://go.dev/ref/spec#Integer_literals
> > >
> > > Its use in this context also dates back to at least Ada 83
> > > (
>
http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal
> .)
> > >
> > > Many other communities see the benefit of this feature, I think
R's
> > > community would benefit from it as well.
> > >
> > > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at
gmail.com>
> wrote:
> > >>
> > >> On Fri, 15 Jul 2022 11:25:32 -0400
> > >> <avi.e.gross at gmail.com> wrote:
> > >>
> > >>> R normally delays evaluation so chunks of code are handed
over
> > >>> untouched to functions that often play with the text
directly without
> > >>> evaluating it until, perhaps, much later.
> > >>
> > >> Do they play with the text, or with the syntax tree after it
went
> > >> through the parser? While it's true that R saves the
source text of
> the
> > >> functions for ease of debugging, it's not guaranteed that
a given
> > >> object will have source references, and typical NSE functions
operate
> > >> on language objects which are tree-like structures containing
R
> values,
> > >> not source text.
> > >>
> > >> You are, of course, right that any changes to the syntax of
the
> > >> language must be carefully considered, but if anyone wants to
play
> with
> > >> this idea, it can be implemented in a very simple manner:
> > >>
> > >> --- src/main/gram.y     (revision 82598)
> > >> +++ src/main/gram.y     (working copy)
> > >> @@ -2526,7 +2526,7 @@
> > >>       YYTEXT_PUSH(c, yyp);
> > >>       /* We don't care about other than ASCII digits */
> > >>       while (isdigit(c = xxgetc()) || c == '.' || c
== 'e' || c == 'E'
> > >> -          || c == 'x' || c == 'X' || c ==
'L')
> > >> +          || c == 'x' || c == 'X' || c ==
'L' || c == '_')
> > >>       {
> > >>          count++;
> > >>          if (c == 'L') /* must be at the end. 
Won't allow 1Le3 (at
> present). */
> > >> @@ -2533,6 +2533,9 @@
> > >>          {   YYTEXT_PUSH(c, yyp);
> > >>              break;
> > >>          }
> > >> +       if (c == '_') { /* allow an underscore
anywhere inside the
> literal */
> > >> +           continue;
> > >> +       }
> > >>
> > >>          if (c == 'x' || c == 'X') {
> > >>              if (count > 2 || last != '0') break; 
/* 0x must be
> first */
> > >>
> > >> To an NSE function, the underscored literals are
indistinguishable
> from
> > >> normal ones, because they don't see the literals:
> > >>
> > >> stopifnot(all.equal(\() 1000000, \() 1_000_000))
> > >> f <- function(x, y) stopifnot(all.equal(substitute(x),
substitute(y)))
> > >> f(1e6, 1_000_000)
> > >>
> > >> Although it's true that the source references change as a
result:
> > >>
> > >> lapply(
> > >>   list(\() 1000000, \() 1_000_000),
> > >>   \(.) as.character(getSrcref(.))
> > >> )
> > >> # [[1]]
> > >> # [1] "\\() 1000000"
> > >> #
> > >> # [[2]]
> > >> # [1] "\\() 1_000_000"
> > >>
> > >> This patch is somewhat simplistic: it allows both multiple
underscores
> > >> in succession and underscores at the end of the number
literal. Perl
> > >> does so too, but with a warning:
> > >>
> > >> perl -wE'say "true" if 1__000_ == 1000'
> > >> # Misplaced _ in number at -e line 1.
> > >> # Misplaced _ in number at -e line 1.
> > >> # true
> > >>
> > >> --
> > >> Best regards,
> > >> Ivan
> > >>
> > >> ______________________________________________
> > >> R-devel at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

R devel - Jul 2022 - Feature Request: Allow Underscore Separated Numbers

[Rd] Feature Request: Allow Underscore Separated Numbers

[Rd] Feature Request: Allow Underscore Separated Numbers