>>>>> William Dunlap >>>>> on Fri, 21 Feb 2020 14:05:49 -0800 writes:> If we change the behavior NULL--[[--assignment from > `[[<-`(NULL, 1, "a" ) # gives "a" (*not* a list) > to > `[[<-`(NULL, 1, "a" ) # gives list("a") > then we have more consistency there *and* your bug is fixed too. > Of course, in other situations back-compatibility would be > broken as well. > Would that change the result of > L <- list(One=1) ; L$Two[[1]] <- 2 > from the current list(One=1,Two=2) to list(One=1, Two=list(2)) > and the result of > F <- 1L ; levels(F)[[1]] <- "one" > from structure(1L, levels="one") to structure(1L, levels=list("one"))? Yes (twice). This is indeed what happens in current R-devel, as I had committed the proposition above yesterday. So R-devel (with svn rev >= 77844 ) does this :> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)list(One = 1, Two = list(2))> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)structure(1L, .Label = list("one"))>but I find that still considerably more logical than current (pre R-devel) R's> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)list(One = 1, Two = 2)> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)list(One = 1, Two = list(2:3))> > F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)structure(1L, .Label = "one")> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F)structure(1L, .Label = list(c("one", "TWO")))>> This change would make L$Name[[1]] <- value act like L$Name$one <- value > in cases when L did not have a component named "Name" and value > had length 1. (I don't entirely get what you mean, but) indeed, the [[<- assignments will be closer to corresponding $<- assignments... which I thought would be another good thing about the change. > I have seen users use [[<- where [<- is more appropriate in cases like > this. Should there be a way to generate warnings about the change in > behavior as you've done with other syntax changes? Well, good question. I'd guess one would get such warnings "all over the place", and if a warning is given only once per session it may not be effective ... also the warning be confusing to the 99.9% of R users who don't even get what we are talking about here ;-) Thank you for your comments.. I did not get too many. Martin > Bill Dunlap > TIBCO Software > wdunlap tibco.com > On Wed, Feb 19, 2020 at 12:59 PM Martin Maechler <maechler at stat.math.ethz.ch> > wrote: >> >>>>> Martin Maechler >> >>>>> on Wed, 19 Feb 2020 18:06:57 +0100 writes: >> >> >>>>> Serguei Sokol >> >>>>> on Wed, 19 Feb 2020 15:21:21 +0100 writes: >> >> >> Hi, >> >> I was bitten by a little incoherence in dimnames assignment or may >> be I >> >> missed some point. >> >> Here is the case. If I assign row names via dimnames(a)[[1]], when >> >> nrow(a)=1 then an error is thrown. But if I do the same when >> nrow(a) > 1 >> >> it's OK. Is one of this case works unexpectedly? Both? Neither? >> >> >> a=as.matrix(1) >> >> dimnames(a)[[1]]="a" # error: 'dimnames' must be a list >> >> >> aa=as.matrix(1:2) >> >> dimnames(aa)[[1]]=c("a", "b") # OK >> >> >> In the second case, dimnames(aa) is not a list (like in the first >> case) >> >> but it works. >> >> I would expect that the both work or neither. >> >> > I agree (even though I'm strongly advising people to use '<-' >> > instead of '='); >> > which in this case helps you get the name of the function really >> > involved: It is `dimnames<-` (which is implemented in C >> > entirely, for matrices and other arrays). >> >> As a matter of fact, I wrote too quickly, the culprit here is >> the `[[<-` function (rather than `dimnames<-`), >> which has a special "inconsistency" feature when used to "add to NULL"; >> almost surely inherited from S, but I now think we should >> consider dropping on the occasion of aiming for R 4.0.0 : >> >> It's documented in ?Extract that length 1 `[[.]]`-assignment works >> specially for NULL (and dimnames(.) are NULL here). >> >> Note you need to read and understand one of the tougher sections >> in the official 'R Language Definition' Manual, >> section -- 3.4.4 Subset assignment --- >> i.e., >> >> https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment >> >> notably this part: >> >> Nesting of complex assignments is evaluated recursively >> >> names(x)[3] <- "Three" >> >> is equivalent to >> >> `*tmp*` <- x >> x <- "names<-"(`*tmp*`, value="[<-"(names(`*tmp*`), 3, value="Three")) >> rm(`*tmp*`) >> >> and then, apply this to our dimnames(a)[[1]] <- "a" >> and so replace >> >> - 'names<-' by 'dimnames<-' >> - '[<-' by '[[<-' >> >> ---------- >> >> Here is the rest of my analysis as valid R code >> {this is not new, Peter Dalgaard had explained this 10 or 20 >> years ago to a mailing list audience IIRC} : >> >> ## MM: The problematic behavior (bug ?) is in `[[<-`, not in `dimnames<-` : >> >> `[[<-`(NULL, 1, "a" ) # gives "a" (*not* a list) >> `[[<-`(NULL, 1, c("a","b")) # gives list(c("a","b")) !! >> >> ##==> in C code: in subassign.c [ ~/R/D/r-devel/R/src/main/subassign.c ] >> ##==> function (~ 340 lines) >> ## do_subassign2_dflt(SEXP call, SEXP op, SEXP args, SEXP rho) >> ## has >> " >> line svn r. svn auth. c.o.d.e... >> ---- ------ --------- ---------------------------------------------- >> 1741 4166 ihaka if (isNull(x)) { >> 1742 45446 ripley if (isNull(y)) { >> 1743 76166 luke UNPROTECT(2); /* args, y */ >> 1744 4166 ihaka return x; >> 1745 45446 ripley } >> 1746 35680 murdoch if (length(y) == 1) >> 1747 68094 luke x = allocVector(TYPEOF(y), 0); >> 1748 24954 ripley else >> 1749 68094 luke x = allocVector(VECSXP, 0); >> 1750 1820 ihaka } >> ---- ------ --------- ---------------------------------------------- >> " >> ## so clearly, in case the value is of length 1, no list is created . >> >> ## For dimnames<- Replacing NULL by list() should be done in both cases >> , and then things work : >> `[[<-`(list(), 1, "a" ) # gives list( "a" ) >> `[[<-`(list(), 1, c("a","b")) # gives list(c("a","b")) !! >> >> ## but the problem here is that `[[<-` at this time in the game >> ## does *not* know that it comes from dimnames<- .... >> >> --------------- >> >> If we change the behavior NULL--[[--assignment from >> >> `[[<-`(NULL, 1, "a" ) # gives "a" (*not* a list) >> >> to >> >> `[[<-`(NULL, 1, "a" ) # gives list("a") >> >> >> then we have more consistency there *and* your bug is fixed too. >> Of course, in other situations back-compatibility would be >> broken as well. >> >> At the moment, I think we (R Core Team) should consider doing >> that here. >> >> Martin >> >> >> >> >> Your thoughts are welcome. >> >> > I think we'd be happy if you report this formally on R's >> > bugzilla - https://bugs.r-project.org/ - as a bug. >> --> https://www.r-project.org/bugs.html >> >> >> From reading bugs.html you note that you should ask for >> >> an account there; >> > as I'm one of the people who get such request by e-mail, >> > in this case, I can do it directly (if you confirm you'd >> > want in a private e-mail). >> >> >> Best, Serguei. >> >> >> PS the same apply for dimnames(a)[[2]]<-. >> >> > (of course) >> >> > NB *and* importantly, the buglet is still in current >> > versions of R >> >> > Best, Martin >> >> >>> sessionInfo() >> >> R version 3.6.1 (2019-07-05) Platform: >> >> x86_64-pc-linux-gnu (64-bit) Running under: Mageia 7 >> >> >> Matrix products: default BLAS/LAPACK: >> >> /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so >> >> >> locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C [3] >> >> LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8 [5] >> >> LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 [7] >> >> LC_PAPER=fr_FR.UTF-8 LC_NAME=C [9] >> >> LC_ADDRESS=C LC_TELEPHONE=C [11] >> >> LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C >> >> >> attached base packages: [1] parallel stats graphics >> >> grDevices utils datasets methods [8] base >> >> >> other attached packages: [1] multbxxc_1.0.1 >> >> rmumps_5.2.1-11 [3] arrApply_2.1 >> >> RcppArmadillo_0.9.800.4.0 [5] Rcpp_1.0.3 >> >> slam_0.1-47 [7] nnls_1.4 >> >> >> loaded via a namespace (and not attached): [1] >> >> compiler_3.6.1 tools_3.6.1 codetools_0.2-16 >> >> >> ______________________________________________ >> >> R-devel at r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >>
Martin Maechler
2020-Feb-22 20:43 UTC
[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]
>>>>> Martin Maechler >>>>> on Sat, 22 Feb 2020 20:20:49 +0100 writes:>>>>> William Dunlap >>>>> on Fri, 21 Feb 2020 14:05:49 -0800 writes:>> If we change the behavior NULL--[[--assignment from >> `[[<-`(NULL, 1, "a" ) # gives "a" (*not* a list) >> to >> `[[<-`(NULL, 1, "a" ) # gives list("a") >> then we have more consistency there *and* your bug is fixed too. >> Of course, in other situations back-compatibility would be >> broken as well. >> Would that change the result of >> L <- list(One=1) ; L$Two[[1]] <- 2 >> from the current list(One=1,Two=2) to list(One=1, Two=list(2)) >> and the result of >> F <- 1L ; levels(F)[[1]] <- "one" >> from structure(1L, levels="one") to structure(1L, levels=list("one"))? > Yes (twice). > This is indeed what happens in current R-devel, as I had > committed the proposition above yesterday. > So R-devel (with svn rev >= 77844 ) does this : >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L) > list(One = 1, Two = list(2)) >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F) > structure(1L, .Label = list("one")) >> > but I find that still considerably more logical than current > (pre R-devel) R's >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L) > list(One = 1, Two = 2) >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L) > list(One = 1, Two = list(2:3)) >> >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F) > structure(1L, .Label = "one") >> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F) > structure(1L, .Label = list(c("one", "TWO"))) >> >> This change would make L$Name[[1]] <- value act like L$Name$one <- value >> in cases when L did not have a component named "Name" and value >> had length 1. > (I don't entirely get what you mean, but) > indeed, > the [[<- assignments will be closer to corresponding $<- assignments... > which I thought would be another good thing about the change. >> I have seen users use [[<- where [<- is more appropriate in cases like >> this. Should there be a way to generate warnings about the change in >> behavior as you've done with other syntax changes? > Well, good question. > I'd guess one would get such warnings "all over the place", and > if a warning is given only once per session it may not be > effective ... also the warning be confusing to the 99.9% of R users who > don't even get what we are talking about here ;-) > Thank you for your comments.. I did not get too many. Well, there's one situation where semi-experienced package authors are bitten by the new R-devel behavior... I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884. One case is exactly as you (Bill) mention above: people using dd[[.]] <- .. where they should use single [.]. In one package, I see an inefficient for loop over all rows of a data frame 'dd' for(i in 1:nrow(dd)) { ... dd$<nonexisting_column>[[i]] <- <one character string> } This used to work -- as said quite inefficiently: for i=1 it created the **full** data frame column and then, once the column exists, it presumably does assign one entry after the other... Now this code breaks (later!) in the package now, because the new column ends up as a *list* of strings, instead of a vector of strings. I think there are quite a few such cases also in other CRAN packages which now break with the latest R-devel. Coming back to Bill Dunlap's question: Should we not warn here? And now when our toplevel list is a data frame, maybe we should warn indeed, if we can easily limit ourselves to such "bizarre" ways of growng a data frame ... dd $ foo [[i]] <- vv <==> `*tmp*` <- dd dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv)) rm(`*tmp*`) but then really we have the same problem as previously: The `[[<-`(NULL, i, vv) part does not "know" anything about the fact that we are in a data frame column creation context. If the R package author had used '[i]' instead of '[[i]]' he|she would have been safe (as they would be if they worked more efficiently and created the whole variable as a vector and only then added it to the data frame ... but then, it seems people want to perpetuate the claim of R to be slow ... even if it's them who make R run slowly ... ;-))
William Dunlap
2020-Feb-22 22:18 UTC
[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]
> but then, it seems people want to perpetuate the > claim of R to be slowMore charitably, I think that the thinking may have been that since x[[i]] gives you one element of x, they should use x[[i]]<-value, for scalar i, to stick in one element. Bill Dunlap TIBCO Software wdunlap tibco.com On Sat, Feb 22, 2020 at 12:44 PM Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> Martin Maechler > >>>>> on Sat, 22 Feb 2020 20:20:49 +0100 writes: > > >>>>> William Dunlap > >>>>> on Fri, 21 Feb 2020 14:05:49 -0800 writes: > > >> If we change the behavior NULL--[[--assignment from > > >> `[[<-`(NULL, 1, "a" ) # gives "a" (*not* a list) > > >> to > > >> `[[<-`(NULL, 1, "a" ) # gives list("a") > > >> then we have more consistency there *and* your bug is fixed too. > >> Of course, in other situations back-compatibility would be > >> broken as well. > > >> Would that change the result of > >> L <- list(One=1) ; L$Two[[1]] <- 2 > >> from the current list(One=1,Two=2) to list(One=1, Two=list(2)) > > >> and the result of > >> F <- 1L ; levels(F)[[1]] <- "one" > >> from structure(1L, levels="one") to structure(1L, > levels=list("one"))? > > > Yes (twice). > > > This is indeed what happens in current R-devel, as I had > > committed the proposition above yesterday. > > So R-devel (with svn rev >= 77844 ) does this : > > >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L) > > list(One = 1, Two = list(2)) > >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F) > > structure(1L, .Label = list("one")) > >> > > > but I find that still considerably more logical than current > > (pre R-devel) R's > > >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L) > > list(One = 1, Two = 2) > >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L) > > list(One = 1, Two = list(2:3)) > >> > >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F) > > structure(1L, .Label = "one") > >> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ; dput(F) > > structure(1L, .Label = list(c("one", "TWO"))) > >> > > > >> This change would make L$Name[[1]] <- value act like L$Name$one <- > value > >> in cases when L did not have a component named "Name" and value > >> had length 1. > > > (I don't entirely get what you mean, but) > > indeed, > > the [[<- assignments will be closer to corresponding $<- > assignments... > > which I thought would be another good thing about the change. > > >> I have seen users use [[<- where [<- is more appropriate in cases > like > >> this. Should there be a way to generate warnings about the change > in > >> behavior as you've done with other syntax changes? > > > Well, good question. > > I'd guess one would get such warnings "all over the place", and > > if a warning is given only once per session it may not be > > effective ... also the warning be confusing to the 99.9% of R users > who > > don't even get what we are talking about here ;-) > > > Thank you for your comments.. I did not get too many. > > Well, there's one situation where semi-experienced package > authors are bitten by the new R-devel behavior... > > I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884. > > One case is exactly as you (Bill) mention above: people using > dd[[.]] <- .. where they should use single [.]. > > In one package, I see an inefficient for loop over all rows of a > data frame 'dd' > > for(i in 1:nrow(dd)) { > > ... > > dd$<nonexisting_column>[[i]] <- <one character string> > > } > > This used to work -- as said quite inefficiently: > for i=1 it created the **full** data frame column and then, > once the column exists, it presumably does assign one entry > after the other... > > Now this code breaks (later!) in the package now, because the > new column ends up as a *list* of strings, instead of a vector > of strings. > > I think there are quite a few such cases also in other CRAN > packages which now break with the latest R-devel. > > Coming back to Bill Dunlap's question: Should we not warn here? > And now when our toplevel list is a data frame, maybe we should > warn indeed, if we can easily limit ourselves to such "bizarre" > ways of growng a data frame ... > > > dd $ foo [[i]] <- vv > > <==> > > `*tmp*` <- dd > dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv)) > rm(`*tmp*`) > > but then really we have the same problem as previously: The > `[[<-`(NULL, i, vv) part does not "know" anything about the > fact that we are in a data frame column creation context. > > If the R package author had used '[i]' instead of '[[i]]' > he|she would have been safe > > (as they would be if they worked more efficiently and created > the whole variable as a vector and only then added it to the > data frame ... but then, it seems people want to perpetuate the > claim of R to be slow ... even if it's them who make R run > slowly ... ;-)) > >[[alternative HTML version deleted]]