Duncan Murdoch
2023-Mar-11 22:44 UTC
[Rd] Multiple Assignment built into the R Interpreter?
On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:> Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can > follow all aspects you raised, but to give my limited take on a few: > >> your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()). > > I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d) is not a statement, it is a sequence of 4 statements. Duncan Murdoch as the> above expression should. `%=%` assigns to > environments, so 1 + (c("A", "C") %=% init_matrices()) returns > numeric(0), with A and C having their values assigned. > >> suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() > Should assignment be by position or by name? > > In other languages this is by position. The feature is not meant to > replace list2env(), and being able to rename objects in the assignment > is a vital feature of codes > using multi input and output functions e.g. in Matlab or Julia. > >> Honestly, given that this is simply syntactic sugar, I don't think I would support it. > > You can call it that, but it would be used by almost every R user almost > every day. Simple things like nr, nc = dim(x); values, vectors = > eigen(x) etc. where the creation of intermediate objects > is cumbersome and redundant. > >> I see you've already mentioned it ("JavaScript-like"). I think it would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language. > > I don't have strong opinions about how the issue is phrased or > implemented. Something like [t, n] = dim(x) might even be more clear. > It's important though that assignment remains by position, > so even if some output gets thrown away that should also be positional. > >> A <- 0 > [A, B = A + 10] <- list(1, A = 2) > > I also fail to see the use of allowing this. something like this is an > error. > >> A = 2 >> (B = A + 1) <- 1 > Error in (B = A + 1) <- 1 : could not find function "(<-" > > Regarding the practical implementation, I think `collapse::%=%` is a > good starting point. It could be introduced in R as a separate function, > or `=` could be modified to accommodate its capability. It should be > clear that > with more than one LHS variables the assignment is an environment level > operation and the results can only be used in computations once assigned > to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()), > A and C are not available for the addition in this statement. The > interpretor then needs to be modified to read something like nr, nc = > dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment > operation with no > immediate value. Appears very feasible to my limited understanding, but > I guess there are other things to consider still. Definitely appreciate > the responses so far though. > > Best regards, > > Sebastian > > > > > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com > <mailto:murdoch.duncan at gmail.com>> wrote: > > On 11/03/2023 11:57 a.m., Ivan Krylov wrote: > > On Sat, 11 Mar 2023 11:11:06 -0500 > > Duncan Murdoch <murdoch.duncan at gmail.com > <mailto:murdoch.duncan at gmail.com>> wrote: > > > >> That's clear, but your proposal violates a very basic property > of the > >> language, i.e. that all statements are expressions and have a value. > > > > How about reframing this feature request from multiple assignment > > (which does go contrary to "everything has only one value, even > if it's > > sometimes invisible(NULL)") to "structured binding" / "destructuring > > assignment" [*], which takes this single single value returned by the > > expression and subsets it subject to certain rules? It may be > easier to > > make a decision on the semantics for destructuring assignment (e.g. > > languages which have this feature typically allow throwing unneeded > > parts of the return value away), and it doesn't seem to break as much > > of the rest of the language if implemented. > > > > I see you've already mentioned it ("JavaScript-like"). I think it > would > > fulfil Sebastian's requirements too, as long as it is considered > "true > > assignment" by the rest of the language. > > > > The hard part is to propose the actual grammar of the new feature (in > > terms of src/main/gram.y, preferably without introducing > conflicts) and > > its semantics (including the corner cases, some of which you have > > already mentioned). I'm not sure I'm up to the task. > > > > If I were doing it, here's what I'd propose: > > ? ?'[' formlist ']' LEFT_ASSIGN expr > ? ?'[' formlist ']' EQ_ASSIGN expr > ? ?expr RIGHT_ASSIGN? '[' formlist ']' > > where `formlist` has the syntax of the formals list for a function > definition.? This would have the following semantics: > > ? ? { > ? ? ? *tmp* <- expr > > ? ? ? # For arguments with no "default" expression, > > ? ? ? argname1 <- *tmp*[[1]] > ? ? ? argname2 <- *tmp*[[2]] > ? ? ? ... > > ? ? ? # For arguments with a default listed > > ? ? ? argname3 <- with(*tmp*, default3) > ? ? } > > > The value of the whole thing would therefore be (invisibly) the > value of > the last item in the assignment. > > Two examples: > > ? ?[A, B, C] <- expr? ?# assign the first three elements of expr to A, > B, and C > > ? ?[A, B, C = a + b] <- expr? # assign the first two elements of expr > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # to A and B, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # assign with(expr, a + b) to C. > > Unfortunately, I don't think this could be done entirely by > transforming > the expression (which is the way |> was done), and that makes it a lot > harder to write and to reason about.? E.g. what does this do? > > ? ?A <- 0 > ? ?[A, B = A + 10] <- list(1, A = 2) > > According to the recipe above, I think it sets A to 1 and B to 12, but > maybe a user would expect B to be 10 or 11.? And according to that > recipe this is an error: > > ? ?[A, B = A + 10] <- c(1, A = 2) > > which probably isn't what a user would expect, given that this is fine: > > ? ?[A, B] <- c(1, 2) > > Duncan Murdoch >
FWIW, it's possible to get fairly close to your proposed semantics using the existing metaprogramming facilities in R. I put together a prototype package here to demonstrate: https://github.com/kevinushey/dotty The package exports an object called `.`, with a special `[<-.dot` S3 method which enables destructuring assignments. This means you can write code like: .[nr, nc] <- dim(mtcars) and that will define 'nr' and 'nc' as you expect. As for R CMD check warnings, you can suppress those through the use of globalVariables(), and that can also be automated within the package. The 'dotty' package includes a function 'dotify()' which automates looking for such usages in your package, and calling globalVariables() so that R CMD check doesn't warn. In theory, a similar technique would be applicable to other packages defining similar operators (zeallot, collapse). Obviously, globalVariables() is a very heavy hammer to swing for this issue, but you might consider the benefits worth the tradeoffs. Best, Kevin On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote: > > Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can > > follow all aspects you raised, but to give my limited take on a few: > > > >> your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()). > > > > I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr > > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error, > > > d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d) > > is not a statement, it is a sequence of 4 statements. > > Duncan Murdoch > > as the > > above expression should. `%=%` assigns to > > environments, so 1 + (c("A", "C") %=% init_matrices()) returns > > numeric(0), with A and C having their values assigned. > > > >> suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() > Should assignment be by position or by name? > > > > In other languages this is by position. The feature is not meant to > > replace list2env(), and being able to rename objects in the assignment > > is a vital feature of codes > > using multi input and output functions e.g. in Matlab or Julia. > > > >> Honestly, given that this is simply syntactic sugar, I don't think I would support it. > > > > You can call it that, but it would be used by almost every R user almost > > every day. Simple things like nr, nc = dim(x); values, vectors > > eigen(x) etc. where the creation of intermediate objects > > is cumbersome and redundant. > > > >> I see you've already mentioned it ("JavaScript-like"). I think it would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language. > > > > I don't have strong opinions about how the issue is phrased or > > implemented. Something like [t, n] = dim(x) might even be more clear. > > It's important though that assignment remains by position, > > so even if some output gets thrown away that should also be positional. > > > >> A <- 0 > [A, B = A + 10] <- list(1, A = 2) > > > > I also fail to see the use of allowing this. something like this is an > > error. > > > >> A = 2 > >> (B = A + 1) <- 1 > > Error in (B = A + 1) <- 1 : could not find function "(<-" > > > > Regarding the practical implementation, I think `collapse::%=%` is a > > good starting point. It could be introduced in R as a separate function, > > or `=` could be modified to accommodate its capability. It should be > > clear that > > with more than one LHS variables the assignment is an environment level > > operation and the results can only be used in computations once assigned > > to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()), > > A and C are not available for the addition in this statement. The > > interpretor then needs to be modified to read something like nr, nc > > dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment > > operation with no > > immediate value. Appears very feasible to my limited understanding, but > > I guess there are other things to consider still. Definitely appreciate > > the responses so far though. > > > > Best regards, > > > > Sebastian > > > > > > > > > > > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com > > <mailto:murdoch.duncan at gmail.com>> wrote: > > > > On 11/03/2023 11:57 a.m., Ivan Krylov wrote: > > > On Sat, 11 Mar 2023 11:11:06 -0500 > > > Duncan Murdoch <murdoch.duncan at gmail.com > > <mailto:murdoch.duncan at gmail.com>> wrote: > > > > > >> That's clear, but your proposal violates a very basic property > > of the > > >> language, i.e. that all statements are expressions and have a value. > > > > > > How about reframing this feature request from multiple assignment > > > (which does go contrary to "everything has only one value, even > > if it's > > > sometimes invisible(NULL)") to "structured binding" / "destructuring > > > assignment" [*], which takes this single single value returned by the > > > expression and subsets it subject to certain rules? It may be > > easier to > > > make a decision on the semantics for destructuring assignment (e.g. > > > languages which have this feature typically allow throwing unneeded > > > parts of the return value away), and it doesn't seem to break as much > > > of the rest of the language if implemented. > > > > > > I see you've already mentioned it ("JavaScript-like"). I think it > > would > > > fulfil Sebastian's requirements too, as long as it is considered > > "true > > > assignment" by the rest of the language. > > > > > > The hard part is to propose the actual grammar of the new feature (in > > > terms of src/main/gram.y, preferably without introducing > > conflicts) and > > > its semantics (including the corner cases, some of which you have > > > already mentioned). I'm not sure I'm up to the task. > > > > > > > If I were doing it, here's what I'd propose: > > > > '[' formlist ']' LEFT_ASSIGN expr > > '[' formlist ']' EQ_ASSIGN expr > > expr RIGHT_ASSIGN '[' formlist ']' > > > > where `formlist` has the syntax of the formals list for a function > > definition. This would have the following semantics: > > > > { > > *tmp* <- expr > > > > # For arguments with no "default" expression, > > > > argname1 <- *tmp*[[1]] > > argname2 <- *tmp*[[2]] > > ... > > > > # For arguments with a default listed > > > > argname3 <- with(*tmp*, default3) > > } > > > > > > The value of the whole thing would therefore be (invisibly) the > > value of > > the last item in the assignment. > > > > Two examples: > > > > [A, B, C] <- expr # assign the first three elements of expr to A, > > B, and C > > > > [A, B, C = a + b] <- expr # assign the first two elements of expr > > # to A and B, > > # assign with(expr, a + b) to C. > > > > Unfortunately, I don't think this could be done entirely by > > transforming > > the expression (which is the way |> was done), and that makes it a lot > > harder to write and to reason about. E.g. what does this do? > > > > A <- 0 > > [A, B = A + 10] <- list(1, A = 2) > > > > According to the recipe above, I think it sets A to 1 and B to 12, but > > maybe a user would expect B to be 10 or 11. And according to that > > recipe this is an error: > > > > [A, B = A + 10] <- c(1, A = 2) > > > > which probably isn't what a user would expect, given that this is fine: > > > > [A, B] <- c(1, 2) > > > > Duncan Murdoch > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel