thr3ads.net - R devel - [Rd] Feature Request: Allow Underscore Separated Numbers [Jul 2022]

If this information is useful, please help other people find it:
Share via:

Bill Dunlap

2022-Jul-15 19:34 UTC

[Rd] Feature Request: Allow Underscore Separated Numbers

The token '._1' (period underscore digit) is currently parsed as a
symbol
(name).  It would become a number if underscore were ignored as in the
first proposal.  The just-between-digits alternative would avoid this
change.

-Bill

On Fri, Jul 15, 2022 at 12:26 PM Jim Hester <james.f.hester at gmail.com>
wrote:
> I think keeping it simple and less restrictive is the best approach,
> for ease of implementation, limiting future maintenance, and so users
> have the flexibility to format these however they wish. So I would
> probably lean towards allowing multiple delimiters anywhere (including
> trailing) or possibly just between digits.
>
> On Fri, Jul 15, 2022 at 2:26 PM Duncan Murdoch <murdoch.duncan at
gmail.com>
> wrote:
> >
> > Thanks for posting that list.  The Python document is the only one
I've
> > read so far; it has a really nice summary
> > (https://peps.python.org/pep-0515/#prior-art) of the differences in
> > implementations among 10 languages.  Which choice would you recommend,
> > and why?
> >
> >   - I think Ivan's quick solution doesn't quite match any of
them.
> >   - C, Fortran and C++ have special support in R, but none of them use
> > underscore separators.
> >   - C++ does support separators, but uses "'", not
"_", and some ancient
> > forms of Fortran ignore embedded spaces.
> >
> > Duncan Murdoch
> >
> > On 15/07/2022 1:58 p.m., Jim Hester wrote:
> > > Allowing underscores in numeric literals is becoming a very
common
> > > feature in computing languages. All of these languages (and more)
now
> > > support it
> > >
> > > python: https://peps.python.org/pep-0515/
> > > javascript: https://v8.dev/features/numeric-separators
> > > julia:
>
https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Floating-Point-Numbers
> > > java:
>
https://docs.oracle.com/javase/7/docs/technotes/guides/language/underscores-literals.html#:~:text=In%20Java%20SE%207%20and,the%20readability%20of%20your%20code
> .
> > > ruby:
> https://docs.ruby-lang.org/en/2.0.0/syntax/literals_rdoc.html#label-Numbers
> > > perl: https://perldoc.perl.org/perldata#Scalar-value-constructors
> > > rust:
> https://doc.rust-lang.org/rust-by-example/primitives/literals.html
> > > C#:
>
https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types#real-literals
> > > go: https://go.dev/ref/spec#Integer_literals
> > >
> > > Its use in this context also dates back to at least Ada 83
> > > (
>
http://archive.adaic.com/standards/83lrm/html/lrm-02-04.html#:~:text=A%20decimal%20literal%20is%20a,the%20base%20is%20implicitly%20ten).&text=An%20underline%20character%20inserted%20between,value%20of%20this%20numeric%20literal
> .)
> > >
> > > Many other communities see the benefit of this feature, I think
R's
> > > community would benefit from it as well.
> > >
> > > On Fri, Jul 15, 2022 at 1:22 PM Ivan Krylov <krylov.r00t at
gmail.com>
> wrote:
> > >>
> > >> On Fri, 15 Jul 2022 11:25:32 -0400
> > >> <avi.e.gross at gmail.com> wrote:
> > >>
> > >>> R normally delays evaluation so chunks of code are handed
over
> > >>> untouched to functions that often play with the text
directly without
> > >>> evaluating it until, perhaps, much later.
> > >>
> > >> Do they play with the text, or with the syntax tree after it
went
> > >> through the parser? While it's true that R saves the
source text of
> the
> > >> functions for ease of debugging, it's not guaranteed that
a given
> > >> object will have source references, and typical NSE functions
operate
> > >> on language objects which are tree-like structures containing
R
> values,
> > >> not source text.
> > >>
> > >> You are, of course, right that any changes to the syntax of
the
> > >> language must be carefully considered, but if anyone wants to
play
> with
> > >> this idea, it can be implemented in a very simple manner:
> > >>
> > >> --- src/main/gram.y     (revision 82598)
> > >> +++ src/main/gram.y     (working copy)
> > >> @@ -2526,7 +2526,7 @@
> > >>       YYTEXT_PUSH(c, yyp);
> > >>       /* We don't care about other than ASCII digits */
> > >>       while (isdigit(c = xxgetc()) || c == '.' || c
== 'e' || c == 'E'
> > >> -          || c == 'x' || c == 'X' || c ==
'L')
> > >> +          || c == 'x' || c == 'X' || c ==
'L' || c == '_')
> > >>       {
> > >>          count++;
> > >>          if (c == 'L') /* must be at the end. 
Won't allow 1Le3 (at
> present). */
> > >> @@ -2533,6 +2533,9 @@
> > >>          {   YYTEXT_PUSH(c, yyp);
> > >>              break;
> > >>          }
> > >> +       if (c == '_') { /* allow an underscore
anywhere inside the
> literal */
> > >> +           continue;
> > >> +       }
> > >>
> > >>          if (c == 'x' || c == 'X') {
> > >>              if (count > 2 || last != '0') break; 
/* 0x must be
> first */
> > >>
> > >> To an NSE function, the underscored literals are
indistinguishable
> from
> > >> normal ones, because they don't see the literals:
> > >>
> > >> stopifnot(all.equal(\() 1000000, \() 1_000_000))
> > >> f <- function(x, y) stopifnot(all.equal(substitute(x),
substitute(y)))
> > >> f(1e6, 1_000_000)
> > >>
> > >> Although it's true that the source references change as a
result:
> > >>
> > >> lapply(
> > >>   list(\() 1000000, \() 1_000_000),
> > >>   \(.) as.character(getSrcref(.))
> > >> )
> > >> # [[1]]
> > >> # [1] "\\() 1000000"
> > >> #
> > >> # [[2]]
> > >> # [1] "\\() 1_000_000"
> > >>
> > >> This patch is somewhat simplistic: it allows both multiple
underscores
> > >> in succession and underscores at the end of the number
literal. Perl
> > >> does so too, but with a warning:
> > >>
> > >> perl -wE'say "true" if 1__000_ == 1000'
> > >> # Misplaced _ in number at -e line 1.
> > >> # Misplaced _ in number at -e line 1.
> > >> # true
> > >>
> > >> --
> > >> Best regards,
> > >> Ivan
> > >>
> > >> ______________________________________________
> > >> R-devel at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > > ______________________________________________
> > > R-devel at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
	[[alternative HTML version deleted]]

Ivan Krylov

2022-Jul-16 09:24 UTC

head link

[Rd] Feature Request: Allow Underscore Separated Numbers

On Fri, 15 Jul 2022 12:34:24 -0700
Bill Dunlap <williamwdunlap at gmail.com> wrote:
> The token '._1' (period underscore digit) is currently parsed as a
> symbol (name).  It would become a number if underscore were ignored
> as in the first proposal.  The just-between-digits alternative would
> avoid this change.
Thanks for spotting this! Here's a patch that allows underscores
only between digits and only inside the significand of a number:

--- src/main/gram.y	(revision 82598)
+++ src/main/gram.y	(working copy)
@@ -2526,7 +2526,7 @@
     YYTEXT_PUSH(c, yyp);
     /* We don't care about other than ASCII digits */
     while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c
== 'E'
-	   || c == 'x' || c == 'X' || c == 'L')
+	   || c == 'x' || c == 'X' || c == 'L' || c ==
'_')
     {
 	count++;
 	if (c == 'L') /* must be at the end.  Won't allow 1Le3 (at
present). */
@@ -2538,11 +2538,16 @@
 	    if (count > 2 || last != '0') break;  /* 0x must be first */
 	    YYTEXT_PUSH(c, yyp);
 	    while(isdigit(c = xxgetc()) || ('a' <= c && c <=
'f') ||
-		  ('A' <= c && c <= 'F') || c == '.') {
+		  ('A' <= c && c <= 'F') || c == '.' ||
c == '_') {
 		if (c == '.') {
 		    if (seendot) return ERROR;
 		    seendot = 1;
 		}
+		if (c == '_') {
+		    /* disallow underscores following 0x or followed by non-digit */
+		    if (nd == 0 || typeofnext() >= 2) break;
+		    continue;
+		}
 		YYTEXT_PUSH(c, yyp);
 		nd++;
 	    }
@@ -2588,6 +2593,11 @@
 		break;
 	    seendot = 1;
 	}
+	/* underscores in significand followed by a digit must be skipped */
+	if (c == '_') {
+	    if (seenexp || typeofnext() >= 2) break;
+	    continue;
+	}
 	YYTEXT_PUSH(c, yyp);
 	last = c;
     }


-- 
Best regards,
Ivan

R devel - Jul 2022 - Feature Request: Allow Underscore Separated Numbers

[Rd] Feature Request: Allow Underscore Separated Numbers

[Rd] Feature Request: Allow Underscore Separated Numbers