thr3ads.net - R devel - [Rd] Multiple Assignment built into the R Interpreter? [Mar 2023]

If this information is useful, please help other people find it:
Share via:

Duncan Murdoch

2023-Mar-11 22:44 UTC

[Rd] Multiple Assignment built into the R Interpreter?

On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:> Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can 
> follow all aspects you raised, but to give my limited take on a few:
> 
>> your proposal violates a very basic property of the  language, i.e.
that all statements are expressions and have a value.  > What's the value
of 1 + (A, C = init_matrices()).
> 
> I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr
> = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,

   d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)

is not a statement, it is a sequence of 4 statements.

Duncan Murdoch

  as the> above expression should. `%=%` assigns to
> environments, so 1 + (c("A", "C") %=% init_matrices())
returns
> numeric(0), with A and C having their values assigned.
> 
>> suppose f() returns list(A = 1, B = 2) and I do  > B, A <- f()
> Should assignment be by position or by name?
> 
> In other languages this is by position. The feature is not meant to 
> replace list2env(), and being able to rename objects in the assignment 
> is a vital feature of codes
> using multi input and output functions e.g. in Matlab or Julia.
> 
>> Honestly, given that this is simply syntactic sugar, I don't think
I would support it.
> 
> You can call it that, but it would be used by almost every R user almost 
> every day. Simple things like nr, nc = dim(x); values, vectors = 
> eigen(x) etc. where the creation of intermediate objects
> is cumbersome and redundant.
> 
>> I see you've already mentioned it ("JavaScript-like"). I
think it would  fulfil Sebastian's requirements too, as long as it is
considered "true assignment" by the rest of the language.
> 
> I don't have strong opinions about how the issue is phrased or 
> implemented. Something like [t, n] = dim(x) might even be more clear. 
> It's important though that assignment remains by position,
> so even if some output gets thrown away that should also be positional.
> 
>>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
> 
> I also fail to see the use of allowing this. something like this is an 
> error.
> 
>> A = 2
>> (B = A + 1) <- 1
> Error in (B = A + 1) <- 1 : could not find function "(<-"
> 
> Regarding the practical implementation, I think `collapse::%=%` is a 
> good starting point. It could be introduced in R as a separate function, 
> or `=` could be modified to accommodate its capability. It should be 
> clear that
> with more than one LHS variables the assignment is an environment level 
> operation and the results can only be used in computations once assigned 
> to the environment, e.g. as in 1 + (c("A", "C") %=%
init_matrices()),
> A and C are not available for the addition in this statement. The 
> interpretor then needs to be modified to read something like nr, nc = 
> dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment 
> operation with no
> immediate value. Appears very feasible to my limited understanding, but 
> I guess there are other things to consider still. Definitely appreciate 
> the responses so far though.
> 
> Best regards,
> 
> Sebastian
> 
> 
> 
> 
> 
> On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at
gmail.com
> <mailto:murdoch.duncan at gmail.com>> wrote:
> 
>     On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
>      > On Sat, 11 Mar 2023 11:11:06 -0500
>      > Duncan Murdoch <murdoch.duncan at gmail.com
>     <mailto:murdoch.duncan at gmail.com>> wrote:
>      >
>      >> That's clear, but your proposal violates a very basic
property
>     of the
>      >> language, i.e. that all statements are expressions and have a
value.
>      >
>      > How about reframing this feature request from multiple assignment
>      > (which does go contrary to "everything has only one value,
even
>     if it's
>      > sometimes invisible(NULL)") to "structured
binding" / "destructuring
>      > assignment" [*], which takes this single single value
returned by the
>      > expression and subsets it subject to certain rules? It may be
>     easier to
>      > make a decision on the semantics for destructuring assignment
(e.g.
>      > languages which have this feature typically allow throwing
unneeded
>      > parts of the return value away), and it doesn't seem to break
as much
>      > of the rest of the language if implemented.
>      >
>      > I see you've already mentioned it
("JavaScript-like"). I think it
>     would
>      > fulfil Sebastian's requirements too, as long as it is
considered
>     "true
>      > assignment" by the rest of the language.
>      >
>      > The hard part is to propose the actual grammar of the new feature
(in
>      > terms of src/main/gram.y, preferably without introducing
>     conflicts) and
>      > its semantics (including the corner cases, some of which you have
>      > already mentioned). I'm not sure I'm up to the task.
>      >
> 
>     If I were doing it, here's what I'd propose:
> 
>      ? ?'[' formlist ']' LEFT_ASSIGN expr
>      ? ?'[' formlist ']' EQ_ASSIGN expr
>      ? ?expr RIGHT_ASSIGN? '[' formlist ']'
> 
>     where `formlist` has the syntax of the formals list for a function
>     definition.? This would have the following semantics:
> 
>      ? ? {
>      ? ? ? *tmp* <- expr
> 
>      ? ? ? # For arguments with no "default" expression,
> 
>      ? ? ? argname1 <- *tmp*[[1]]
>      ? ? ? argname2 <- *tmp*[[2]]
>      ? ? ? ...
> 
>      ? ? ? # For arguments with a default listed
> 
>      ? ? ? argname3 <- with(*tmp*, default3)
>      ? ? }
> 
> 
>     The value of the whole thing would therefore be (invisibly) the
>     value of
>     the last item in the assignment.
> 
>     Two examples:
> 
>      ? ?[A, B, C] <- expr? ?# assign the first three elements of expr to
A,
>     B, and C
> 
>      ? ?[A, B, C = a + b] <- expr? # assign the first two elements of
expr
>      ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # to A and B,
>      ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # assign with(expr, a + b) to C.
> 
>     Unfortunately, I don't think this could be done entirely by
>     transforming
>     the expression (which is the way |> was done), and that makes it a
lot
>     harder to write and to reason about.? E.g. what does this do?
> 
>      ? ?A <- 0
>      ? ?[A, B = A + 10] <- list(1, A = 2)
> 
>     According to the recipe above, I think it sets A to 1 and B to 12, but
>     maybe a user would expect B to be 10 or 11.? And according to that
>     recipe this is an error:
> 
>      ? ?[A, B = A + 10] <- c(1, A = 2)
> 
>     which probably isn't what a user would expect, given that this is
fine:
> 
>      ? ?[A, B] <- c(1, 2)
> 
>     Duncan Murdoch
>

Kevin Ushey

2023-Mar-11 23:00 UTC

head link

[Rd] Multiple Assignment built into the R Interpreter?

FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:

    https://github.com/kevinushey/dotty

The package exports an object called `.`, with a special `[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:

    .[nr, nc] <- dim(mtcars)

and that will define 'nr' and 'nc' as you expect.

As for R CMD check warnings, you can suppress those through the use of
globalVariables(), and that can also be automated within the package.
The 'dotty' package includes a function 'dotify()' which
automates
looking for such usages in your package, and calling globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique would
be applicable to other packages defining similar operators (zeallot,
collapse).

Obviously, globalVariables() is a very heavy hammer to swing for this
issue, but you might consider the benefits worth the tradeoffs.

Best,
Kevin

On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:>
> On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
> > Thanks Duncan and Ivan for the careful thoughts. I'm not sure I
can
> > follow all aspects you raised, but to give my limited take on a few:
> >
> >> your proposal violates a very basic property of the  language,
i.e. that all statements are expressions and have a value.  > What's the
value of 1 + (A, C = init_matrices()).
> >
> > I'm not sure I see the point here. I evaluated 1 + (d =
dim(mtcars); nr
> > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
>
>
>    d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
>
> is not a statement, it is a sequence of 4 statements.
>
> Duncan Murdoch
>
>   as the
> > above expression should. `%=%` assigns to
> > environments, so 1 + (c("A", "C") %=%
init_matrices()) returns
> > numeric(0), with A and C having their values assigned.
> >
> >> suppose f() returns list(A = 1, B = 2) and I do  > B, A <-
f() > Should assignment be by position or by name?
> >
> > In other languages this is by position. The feature is not meant to
> > replace list2env(), and being able to rename objects in the assignment
> > is a vital feature of codes
> > using multi input and output functions e.g. in Matlab or Julia.
> >
> >> Honestly, given that this is simply syntactic sugar, I don't
think I would support it.
> >
> > You can call it that, but it would be used by almost every R user
almost
> > every day. Simple things like nr, nc = dim(x); values, vectors >
> eigen(x) etc. where the creation of intermediate objects
> > is cumbersome and redundant.
> >
> >> I see you've already mentioned it
("JavaScript-like"). I think it would  fulfil Sebastian's
requirements too, as long as it is considered "true assignment" by the
rest of the language.
> >
> > I don't have strong opinions about how the issue is phrased or
> > implemented. Something like [t, n] = dim(x) might even be more clear.
> > It's important though that assignment remains by position,
> > so even if some output gets thrown away that should also be
positional.
> >
> >>  A <- 0  > [A, B = A + 10] <- list(1, A = 2)
> >
> > I also fail to see the use of allowing this. something like this is an
> > error.
> >
> >> A = 2
> >> (B = A + 1) <- 1
> > Error in (B = A + 1) <- 1 : could not find function
"(<-"
> >
> > Regarding the practical implementation, I think `collapse::%=%` is a
> > good starting point. It could be introduced in R as a separate
function,
> > or `=` could be modified to accommodate its capability. It should be
> > clear that
> > with more than one LHS variables the assignment is an environment
level
> > operation and the results can only be used in computations once
assigned
> > to the environment, e.g. as in 1 + (c("A", "C")
%=% init_matrices()),
> > A and C are not available for the addition in this statement. The
> > interpretor then needs to be modified to read something like nr, nc
> > dim(x) or [nr, nc] = dim(x). as an environment-level multiple
assignment
> > operation with no
> > immediate value. Appears very feasible to my limited understanding,
but
> > I guess there are other things to consider still. Definitely
appreciate
> > the responses so far though.
> >
> > Best regards,
> >
> > Sebastian
> >
> >
> >
> >
> >
> > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at
gmail.com
> > <mailto:murdoch.duncan at gmail.com>> wrote:
> >
> >     On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> >      > On Sat, 11 Mar 2023 11:11:06 -0500
> >      > Duncan Murdoch <murdoch.duncan at gmail.com
> >     <mailto:murdoch.duncan at gmail.com>> wrote:
> >      >
> >      >> That's clear, but your proposal violates a very
basic property
> >     of the
> >      >> language, i.e. that all statements are expressions and
have a value.
> >      >
> >      > How about reframing this feature request from multiple
assignment
> >      > (which does go contrary to "everything has only one
value, even
> >     if it's
> >      > sometimes invisible(NULL)") to "structured
binding" / "destructuring
> >      > assignment" [*], which takes this single single value
returned by the
> >      > expression and subsets it subject to certain rules? It may
be
> >     easier to
> >      > make a decision on the semantics for destructuring
assignment (e.g.
> >      > languages which have this feature typically allow throwing
unneeded
> >      > parts of the return value away), and it doesn't seem to
break as much
> >      > of the rest of the language if implemented.
> >      >
> >      > I see you've already mentioned it
("JavaScript-like"). I think it
> >     would
> >      > fulfil Sebastian's requirements too, as long as it is
considered
> >     "true
> >      > assignment" by the rest of the language.
> >      >
> >      > The hard part is to propose the actual grammar of the new
feature (in
> >      > terms of src/main/gram.y, preferably without introducing
> >     conflicts) and
> >      > its semantics (including the corner cases, some of which you
have
> >      > already mentioned). I'm not sure I'm up to the task.
> >      >
> >
> >     If I were doing it, here's what I'd propose:
> >
> >         '[' formlist ']' LEFT_ASSIGN expr
> >         '[' formlist ']' EQ_ASSIGN expr
> >         expr RIGHT_ASSIGN  '[' formlist ']'
> >
> >     where `formlist` has the syntax of the formals list for a function
> >     definition.  This would have the following semantics:
> >
> >          {
> >            *tmp* <- expr
> >
> >            # For arguments with no "default" expression,
> >
> >            argname1 <- *tmp*[[1]]
> >            argname2 <- *tmp*[[2]]
> >            ...
> >
> >            # For arguments with a default listed
> >
> >            argname3 <- with(*tmp*, default3)
> >          }
> >
> >
> >     The value of the whole thing would therefore be (invisibly) the
> >     value of
> >     the last item in the assignment.
> >
> >     Two examples:
> >
> >         [A, B, C] <- expr   # assign the first three elements of
expr to A,
> >     B, and C
> >
> >         [A, B, C = a + b] <- expr  # assign the first two elements
of expr
> >                                    # to A and B,
> >                                    # assign with(expr, a + b) to C.
> >
> >     Unfortunately, I don't think this could be done entirely by
> >     transforming
> >     the expression (which is the way |> was done), and that makes
it a lot
> >     harder to write and to reason about.  E.g. what does this do?
> >
> >         A <- 0
> >         [A, B = A + 10] <- list(1, A = 2)
> >
> >     According to the recipe above, I think it sets A to 1 and B to 12,
but
> >     maybe a user would expect B to be 10 or 11.  And according to that
> >     recipe this is an error:
> >
> >         [A, B = A + 10] <- c(1, A = 2)
> >
> >     which probably isn't what a user would expect, given that this
is fine:
> >
> >         [A, B] <- c(1, 2)
> >
> >     Duncan Murdoch
> >
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Reasonably Related Threads

Search for more reasonably related threads

R devel - Mar 2023 - Multiple Assignment built into the R Interpreter?

[Rd] Multiple Assignment built into the R Interpreter?

[Rd] Multiple Assignment built into the R Interpreter?

Reasonably Related Threads