thr3ads.net - R devel - [Rd] Feature Request: Allow Underscore Separated Numbers [Jul 2022]

If this information is useful, please help other people find it:
Share via:

@vi@e@gross m@iii@g oii gm@ii@com

2022-Jul-15 15:25 UTC

[Rd] Feature Request: Allow Underscore Separated Numbers

Andr?,

I am not saying a change cannot be done and am not familiar enough with the
internals of R. If you just want the interpreter to evaluate CONSTANTS in
the code as what you consider syntactic sugar and replace 1_000 with 1000,
that sounds superficially possible. But is it?

R normally delays evaluation so chunks of code are handed over untouched to
functions that often play with the text directly without evaluating it
until, perhaps, much later. And I have pointed out how much work is done
with things like regular expressions or reading things in from a file that
is not done in the REPL but in functions behind the scene. So if there is
any way for a number to slide in without being modified, or places where you
want the darn underscores preserved, you may well cause a glitch.

Languages that design in the ability have obviously dealt with issues and
presumably anyone writing code anew can use a new definition in their work
so they handle such numbers. I am not saying such a change cannot be done,
simply that existing languages are careful about making changes as they
strive to retain compatibility.

So even assuming your statement about not needing to change as.numeric or
read.csv functions is true, aren?t you introducing a change in which the
users will inadvertently use the feature in strings or files and assume it
is a globally recognized feature? I use CSV files and other such formats
quite a bit as a way to exchange data between R and other environments and
unless they all change and allow underscores in numbers, there can be
issues. So, yes, you are suggesting nothing in R will write out numbers with
underscores. But if others do and you import the data into R with a reader
that does not understand, we have anomalies.

I am not arguing with anyone about this. Like many proposed features, it
sounds reasonable just by itself. But for a language that was crafted and
then modified many times, the burden is often on those wanting a change to
convince us that it can be done benignly, effectively and cheaply AND that
it is more worthwhile than a thousand other pending ideas already submitted.

I have never used str2lang() in my life directly so would changing that
really help if as.numeric() and other such functions were left alone and did
not call it? What if I read in a .CSV a line at a time and use various
methods including regular expressions to split the line into parts and then
make the parts into numbers based on some primitive algorithm that maps
digits 0-9 into small integers 0-9 and then positionally multiplies digits
to the left by 10 for each level and adds them up. Will that algorithm know
about underscores and not only ignore them but keep track of how many times
it multiplies the other parts by 10? Sure, we can write a new algorithm with
added complexity but in my view, we can solve the problem in the few cases
it matters without such a change.

Had this been built in originally, maybe not a problem. But consider the
enormous expense of UNICODE and the truly major upheaval needed to get it
working  at a time when lots of code using pointers had a reasonable
expectation that all characters took up the same number of bytes, and
calculating the length of a string could be done by simply subtracting one
pointer from another. Now, you actually have to read the entire string and
count code points, or keep the length as a part of the structure that is
changed any time it changes and so on.

But arguably UNICODE support is now required in many cases. So, yes,
underscores in numbers may become commonplace and cause headaches for a
while. But mathematically, I don?t see them as needed and see many ways to
allow a programmer to see what a number is without any problems in the few
times they want it. Cut and paste in code can easily take out any snippet
accurately and pluck it into a function that displays it with commas or
whatever. But definitely, lazy humans constantly make mistakes and even with
this would still make some.

But if R developers seem confident this change can be done, go for it!
Numeric literals, like other constants, have often been something compiled
languages have optimized out of the way, such as combining multiple
instances of the same one into one memory location.

Avi

From: GILLIBERT, Andre <Andre.Gillibert at chu-rouen.fr> 
Sent: Friday, July 15, 2022 2:31 AM
To: avi.e.gross at gmail.com; r-devel at r-project.org
Subject: RE: [Rd] Feature Request: Allow Underscore Separated Numbers

On 2022-07-14 8:21 p.m., avi.e.gross at gmail.com
<mailto:avi.e.gross at gmail.com>  wrote:> Devin,
>
> I cannot say anyone wants to tweak R after the fact to accept numeric
> items with underscores as that might impact all kinds of places.
>

I am not sure that the feature request of Devin Marlin was correctly
understood.

I guess that he thought about adding syntactic sugar to numeric literals in
the language.

Functions such as as.numeric(), or read.csv() would not be changed.

The main difference would be to make valid code that currently is a "syntax
error", such as:
> 3*100_000
Error: unexpected input in "3*100_"

Breaking code with that feature is possible but improbable.

Indeed, code expecting that str2lang("3*100_000") make a syntax error
(catching the error with try) would break.

Most code generating other code then parsing it with str2lang() should be
fine, because it would generate old-style code with normal numeric
constants.

-- 

Sincerely

Andr? GILLIBERT

	[[alternative HTML version deleted]]

Ivan Krylov

2022-Jul-15 17:21 UTC

head link

[Rd] Feature Request: Allow Underscore Separated Numbers

On Fri, 15 Jul 2022 11:25:32 -0400
<avi.e.gross at gmail.com> wrote:
> R normally delays evaluation so chunks of code are handed over
> untouched to functions that often play with the text directly without
> evaluating it until, perhaps, much later.
Do they play with the text, or with the syntax tree after it went
through the parser? While it's true that R saves the source text of the
functions for ease of debugging, it's not guaranteed that a given
object will have source references, and typical NSE functions operate
on language objects which are tree-like structures containing R values,
not source text.

You are, of course, right that any changes to the syntax of the
language must be carefully considered, but if anyone wants to play with
this idea, it can be implemented in a very simple manner:

--- src/main/gram.y	(revision 82598)
+++ src/main/gram.y	(working copy)
@@ -2526,7 +2526,7 @@
     YYTEXT_PUSH(c, yyp);
     /* We don't care about other than ASCII digits */
     while (isdigit(c = xxgetc()) || c == '.' || c == 'e' || c
== 'E'
-	   || c == 'x' || c == 'X' || c == 'L')
+	   || c == 'x' || c == 'X' || c == 'L' || c ==
'_')
     {
 	count++;
 	if (c == 'L') /* must be at the end.  Won't allow 1Le3 (at
present). */
@@ -2533,6 +2533,9 @@
 	{   YYTEXT_PUSH(c, yyp);
 	    break;
 	}
+	if (c == '_') { /* allow an underscore anywhere inside the literal */
+	    continue;
+	}
 	
 	if (c == 'x' || c == 'X') {
 	    if (count > 2 || last != '0') break;  /* 0x must be first */

To an NSE function, the underscored literals are indistinguishable from
normal ones, because they don't see the literals:

stopifnot(all.equal(\() 1000000, \() 1_000_000))
f <- function(x, y) stopifnot(all.equal(substitute(x), substitute(y)))
f(1e6, 1_000_000)

Although it's true that the source references change as a result:

lapply(
 list(\() 1000000, \() 1_000_000),
 \(.) as.character(getSrcref(.))
)
# [[1]]
# [1] "\\() 1000000"
# 
# [[2]]
# [1] "\\() 1_000_000"

This patch is somewhat simplistic: it allows both multiple underscores
in succession and underscores at the end of the number literal. Perl
does so too, but with a warning:

perl -wE'say "true" if 1__000_ == 1000'
# Misplaced _ in number at -e line 1.
# Misplaced _ in number at -e line 1.
# true

-- 
Best regards,
Ivan

R devel - Jul 2022 - Feature Request: Allow Underscore Separated Numbers

[Rd] Feature Request: Allow Underscore Separated Numbers

[Rd] Feature Request: Allow Underscore Separated Numbers