thr3ads.net - R devel - [Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?] [Feb 2020]

If this information is useful, please help other people find it:
Share via:

Martin Maechler

2020-Feb-22 19:20 UTC

[Rd] dimnames incoherence?

>>>>> William Dunlap 
>>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:
    > If we change the behavior  NULL--[[--assignment from

    > `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a
list)

    > to

    > `[[<-`(NULL, 1, "a" ) # gives  list("a")

    > then we have more consistency there *and* your bug is fixed too.
    > Of course, in other situations back-compatibility would be
    > broken as well.

    > Would that change the result of
    > L <- list(One=1) ; L$Two[[1]] <- 2
    > from the current list(One=1,Two=2) to list(One=1, Two=list(2))

    > and the result of
    > F <- 1L ; levels(F)[[1]] <- "one"
    > from structure(1L, levels="one") to structure(1L,
levels=list("one"))?

Yes (twice).

This is indeed what happens in current R-devel, as I had
committed the proposition above yesterday.
So R-devel (with svn rev >= 77844 )  does this :
> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
list(One = 1, Two = list(2))>  F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
structure(1L, .Label = list("one"))> 
but I find that still considerably more logical than current
(pre R-devel) R's
> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
list(One = 1, Two = 2)> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
list(One = 1, Two = list(2:3))> 
> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
structure(1L, .Label = "one")> F <- 1L ; levels(F)[[1]] <- c("one", "TWO") ;
dput(F)structure(1L, .Label = list(c("one",
"TWO")))>

    > This change would make L$Name[[1]] <- value act like L$Name$one
<- value
    > in cases when L did not have a component named "Name" and
value
    > had length 1.

(I don't entirely get what you mean, but)
indeed,
the  [[<-  assignments will be closer to corresponding $<-  assignments...
which I thought would be another good thing about the change.

    > I have seen users use [[<- where [<- is more appropriate in cases
like
    > this.  Should there be a way to generate warnings about the change in
    > behavior as you've done with other syntax changes?

Well, good question.
I'd guess one would get such warnings "all over the place",  and
if a warning is given only once per session it may not be
effective  ... also the warning be confusing to the 99.9% of R users who
don't even get what we are talking about here ;-)

Thank you for your comments.. I did not get too many.

Martin

    > Bill Dunlap
    > TIBCO Software
    > wdunlap tibco.com


    > On Wed, Feb 19, 2020 at 12:59 PM Martin Maechler <maechler at
stat.math.ethz.ch>
    > wrote:

    >> >>>>> Martin Maechler
    >> >>>>>     on Wed, 19 Feb 2020 18:06:57 +0100 writes:
    >> 
    >> >>>>> Serguei Sokol
    >> >>>>>     on Wed, 19 Feb 2020 15:21:21 +0100 writes:
    >> 
    >> >> Hi,
    >> >> I was bitten by a little incoherence in dimnames
assignment or may
    >> be I
    >> >> missed some point.
    >> >> Here is the case. If I assign row names via
dimnames(a)[[1]], when
    >> >> nrow(a)=1 then an error is thrown. But if I do the same
when
    >> nrow(a) > 1
    >> >> it's OK. Is one of this case works unexpectedly? Both?
Neither?
    >> 
    >> >> a=as.matrix(1)
    >> >> dimnames(a)[[1]]="a" # error: 'dimnames'
must be a list
    >> 
    >> >> aa=as.matrix(1:2)
    >> >> dimnames(aa)[[1]]=c("a", "b") # OK
    >> 
    >> >> In the second case, dimnames(aa) is not a list (like in
the first
    >> case)
    >> >> but it works.
    >> >> I would expect that the both work or neither.
    >> 
    >> > I agree (even though I'm strongly advising people to use
'<-'
    >> > instead of '=');
    >> > which in this case helps you get the name of the function
really
    >> > involved:  It is  `dimnames<-`  (which is implemented in C
    >> > entirely, for matrices and other arrays).
    >> 
    >> As a matter of fact, I wrote too quickly, the culprit here is
    >> the  `[[<-`  function (rather than `dimnames<-`),
    >> which has a special "inconsistency" feature when used to
"add to NULL";
    >> almost surely inherited from S,  but I now think we should
    >> consider dropping on the occasion of aiming for  R 4.0.0 :
    >> 
    >> It's documented in ?Extract  that  length 1  `[[.]]`-assignment
works
    >> specially for NULL (and dimnames(.) are NULL here).
    >> 
    >> Note you need to read and understand one of the tougher sections
    >> in the official  'R Language Definition'  Manual,
    >> section -- 3.4.4 Subset assignment ---
    >> i.e.,
    >> 
    >>
https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Subset-assignment
    >> 
    >> notably this part:
    >> 
    >> Nesting of complex assignments is evaluated recursively
    >> 
    >> names(x)[3] <- "Three"
    >> 
    >> is equivalent to
    >> 
    >> `*tmp*` <- x
    >> x <- "names<-"(`*tmp*`,
value="[<-"(names(`*tmp*`), 3, value="Three"))
    >> rm(`*tmp*`)
    >> 
    >> and then, apply this to our    dimnames(a)[[1]] <- "a"
    >> and so  replace
    >> 
    >> -  'names<-' by 'dimnames<-'
    >> -  '[<-'     by '[[<-'
    >> 
    >> ----------
    >> 
    >> Here is the rest of my analysis as valid R code
    >> {this is not new, Peter Dalgaard had explained this 10 or 20
    >> years ago to a mailing list audience IIRC} :
    >> 
    >> ## MM: The problematic behavior (bug ?) is in `[[<-`, not in
`dimnames<-` :
    >> 
    >> `[[<-`(NULL, 1,   "a"     ) # gives  "a" 
(*not* a list)
    >> `[[<-`(NULL, 1, c("a","b")) # gives
list(c("a","b"))  !!
    >> 
    >> ##==> in C code: in  subassign.c  [
~/R/D/r-devel/R/src/main/subassign.c ]
    >> ##==> function (~ 340 lines)
    >> ##            do_subassign2_dflt(SEXP call, SEXP op, SEXP args,
SEXP rho)
    >> ## has
    >> "
    >> line svn r.  svn auth. c.o.d.e...
    >> ---- ------  ---------
----------------------------------------------
    >> 1741   4166      ihaka     if (isNull(x)) {
    >> 1742  45446     ripley         if (isNull(y)) {
    >> 1743  76166       luke             UNPROTECT(2); /* args, y */
    >> 1744   4166      ihaka             return x;
    >> 1745  45446     ripley         }
    >> 1746  35680    murdoch         if (length(y) == 1)
    >> 1747  68094       luke             x = allocVector(TYPEOF(y), 0);
    >> 1748  24954     ripley         else
    >> 1749  68094       luke             x = allocVector(VECSXP, 0);
    >> 1750   1820      ihaka     }
    >> ---- ------  ---------
----------------------------------------------
    >> "
    >> ## so clearly, in case the value is of length 1, no list is created
.
    >> 
    >> ## For dimnames<-  Replacing NULL by list()  should be done in
both cases
    >> , and then things work :
    >> `[[<-`(list(), 1,   "a"     ) # gives list(
"a" )
    >> `[[<-`(list(), 1, c("a","b")) # gives
list(c("a","b"))  !!
    >> 
    >> ## but the problem here is that  `[[<-` at this time in the game
    >> ## does *not* know that it comes from dimnames<- ....
    >> 
    >> ---------------
    >> 
    >> If we change the behavior  NULL--[[--assignment from
    >> 
    >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a
list)
    >> 
    >> to
    >> 
    >> `[[<-`(NULL, 1, "a" ) # gives  list("a")
    >> 
    >> 
    >> then we have more consistency there *and* your bug is fixed too.
    >> Of course, in other situations back-compatibility would be
    >> broken as well.
    >> 
    >> At the moment, I think we (R Core Team) should consider doing
    >> that here.
    >> 
    >> Martin
    >> 
    >> 
    >> 
    >> >> Your thoughts are welcome.
    >> 
    >> > I think we'd be happy if you report this formally on
R's
    >> > bugzilla - https://bugs.r-project.org/ - as a bug.
    >> 
    --> https://www.r-project.org/bugs.html
    >> 
    >> >> From reading bugs.html you note that you should ask for
    >> >> an account there;
    >> > as I'm one of the people who get such request by e-mail,
    >> > in this case, I can do it directly (if you confirm you'd
    >> > want in a private e-mail).
    >> 
    >> >> Best, Serguei.
    >> 
    >> >> PS the same apply for dimnames(a)[[2]]<-.
    >> 
    >> > (of course)
    >> 
    >> > NB *and* importantly, the buglet is still in current
    >> > versions of R
    >> 
    >> > Best, Martin
    >> 
    >> >>> sessionInfo()
    >> >> R version 3.6.1 (2019-07-05) Platform:
    >> >> x86_64-pc-linux-gnu (64-bit) Running under: Mageia 7
    >> 
    >> >> Matrix products: default BLAS/LAPACK:
    >> >> /home/opt/OpenBLAS/lib/libopenblas_sandybridge-r0.3.6.so
    >> 
    >> >> locale:  [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C  [3]
    >> >> LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8  [5]
    >> >> LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8  [7]
    >> >> LC_PAPER=fr_FR.UTF-8       LC_NAME=C  [9]
    >> >> LC_ADDRESS=C               LC_TELEPHONE=C [11]
    >> >> LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
    >> 
    >> >> attached base packages: [1] parallel  stats     graphics
    >> >> grDevices utils     datasets methods [8] base
    >> 
    >> >> other attached packages: [1] multbxxc_1.0.1
    >> >> rmumps_5.2.1-11 [3] arrApply_2.1
    >> >> RcppArmadillo_0.9.800.4.0 [5] Rcpp_1.0.3
    >> >> slam_0.1-47 [7] nnls_1.4
    >> 
    >> >> loaded via a namespace (and not attached): [1]
    >> >> compiler_3.6.1   tools_3.6.1      codetools_0.2-16
    >> 
    >> >> ______________________________________________
    >> >> R-devel at r-project.org mailing list
    >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 
    >> > ______________________________________________
    >> > R-devel at r-project.org mailing list
    >> > https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >>

Martin Maechler

2020-Feb-22 20:43 UTC

head link

[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]

>>>>> Martin Maechler 
>>>>>     on Sat, 22 Feb 2020 20:20:49 +0100 writes:
>>>>> William Dunlap 
>>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:
    >> If we change the behavior  NULL--[[--assignment from

    >> `[[<-`(NULL, 1, "a" ) # gives  "a"  (*not* a
list)

    >> to

    >> `[[<-`(NULL, 1, "a" ) # gives  list("a")

    >> then we have more consistency there *and* your bug is fixed too.
    >> Of course, in other situations back-compatibility would be
    >> broken as well.

    >> Would that change the result of
    >> L <- list(One=1) ; L$Two[[1]] <- 2
    >> from the current list(One=1,Two=2) to list(One=1, Two=list(2))

    >> and the result of
    >> F <- 1L ; levels(F)[[1]] <- "one"
    >> from structure(1L, levels="one") to structure(1L,
levels=list("one"))?

    > Yes (twice).

    > This is indeed what happens in current R-devel, as I had
    > committed the proposition above yesterday.
    > So R-devel (with svn rev >= 77844 )  does this :

    >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
    > list(One = 1, Two = list(2))
    >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
    > structure(1L, .Label = list("one"))
    >> 

    > but I find that still considerably more logical than current
    > (pre R-devel) R's

    >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
    > list(One = 1, Two = 2)
    >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
    > list(One = 1, Two = list(2:3))
    >> 
    >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
    > structure(1L, .Label = "one")
    >> F <- 1L ; levels(F)[[1]] <- c("one",
"TWO") ; dput(F)
    > structure(1L, .Label = list(c("one", "TWO")))
    >> 

    >> This change would make L$Name[[1]] <- value act like L$Name$one
<- value
    >> in cases when L did not have a component named "Name" and
value
    >> had length 1.

    > (I don't entirely get what you mean, but)
    > indeed,
    > the  [[<-  assignments will be closer to corresponding $<- 
assignments...
    > which I thought would be another good thing about the change.

    >> I have seen users use [[<- where [<- is more appropriate in
cases like
    >> this.  Should there be a way to generate warnings about the change
in
    >> behavior as you've done with other syntax changes?

    > Well, good question.
    > I'd guess one would get such warnings "all over the
place",  and
    > if a warning is given only once per session it may not be
    > effective  ... also the warning be confusing to the 99.9% of R users
who
    > don't even get what we are talking about here ;-)

    > Thank you for your comments.. I did not get too many.

Well, there's one situation where semi-experienced package
authors are bitten by the new R-devel behavior...

I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884.

One case is exactly as you (Bill) mention above: people using
dd[[.]] <- ..   where they should use single [.].

In one package, I see an inefficient for loop over all rows of a
data frame 'dd'

for(i in 1:nrow(dd)) {

 ...

 dd$<nonexisting_column>[[i]] <-  <one character string>

}

This used to work -- as said quite inefficiently:
for i=1 it created the **full** data frame column  and then,
once the column exists, it presumably does assign one entry
after the other...

Now this code breaks (later!) in the package now, because the
new column ends up as a *list* of strings, instead of a vector
of strings.

I think there are quite a few such cases also in other CRAN
packages which now break with the latest R-devel.

Coming back to Bill Dunlap's question: Should we not warn here?
And now when our toplevel list is a data frame, maybe we should
warn indeed, if we can easily limit ourselves to such "bizarre"
ways of growng a data frame  ...

  dd $ foo [[i]] <- vv

<==>

  `*tmp*` <- dd
  dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv))
  rm(`*tmp*`)

but then really we have the same problem as previously: The
 `[[<-`(NULL, i, vv)  part does not "know" anything about the
fact that we are in a data frame column creation context.

If the R package author had used  '[i]' instead of '[[i]]'
he|she would have been safe

(as they would be if they worked more efficiently and created
the whole variable as a vector and only then added it to the
data frame ... but then, it seems people want to perpetuate the
claim of R to be slow ... even if it's them who make R run
slowly ... ;-))

William Dunlap

2020-Feb-22 22:18 UTC

head link

[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]

> but then, it seems people want to perpetuate the
> claim of R to be slow
More charitably, I think that the thinking may have been that since x[[i]]
gives you one element of x,
they should use x[[i]]<-value, for scalar i, to stick in one element.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Sat, Feb 22, 2020 at 12:44 PM Martin Maechler <maechler at
stat.math.ethz.ch>
wrote:
> >>>>> Martin Maechler
> >>>>>     on Sat, 22 Feb 2020 20:20:49 +0100 writes:
>
> >>>>> William Dunlap
> >>>>>     on Fri, 21 Feb 2020 14:05:49 -0800 writes:
>
>     >> If we change the behavior  NULL--[[--assignment from
>
>     >> `[[<-`(NULL, 1, "a" ) # gives  "a" 
(*not* a list)
>
>     >> to
>
>     >> `[[<-`(NULL, 1, "a" ) # gives 
list("a")
>
>     >> then we have more consistency there *and* your bug is fixed
too.
>     >> Of course, in other situations back-compatibility would be
>     >> broken as well.
>
>     >> Would that change the result of
>     >> L <- list(One=1) ; L$Two[[1]] <- 2
>     >> from the current list(One=1,Two=2) to list(One=1, Two=list(2))
>
>     >> and the result of
>     >> F <- 1L ; levels(F)[[1]] <- "one"
>     >> from structure(1L, levels="one") to structure(1L,
> levels=list("one"))?
>
>     > Yes (twice).
>
>     > This is indeed what happens in current R-devel, as I had
>     > committed the proposition above yesterday.
>     > So R-devel (with svn rev >= 77844 )  does this :
>
>     >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
>     > list(One = 1, Two = list(2))
>     >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
>     > structure(1L, .Label = list("one"))
>     >>
>
>     > but I find that still considerably more logical than current
>     > (pre R-devel) R's
>
>     >> L <- list(One=1) ; L$Two[[1]] <- 2 ; dput(L)
>     > list(One = 1, Two = 2)
>     >> L <- list(One=1) ; L$Two[[1]] <- 2:3 ; dput(L)
>     > list(One = 1, Two = list(2:3))
>     >>
>     >> F <- 1L ; levels(F)[[1]] <- "one" ; dput(F)
>     > structure(1L, .Label = "one")
>     >> F <- 1L ; levels(F)[[1]] <- c("one",
"TWO") ; dput(F)
>     > structure(1L, .Label = list(c("one", "TWO")))
>     >>
>
>
>     >> This change would make L$Name[[1]] <- value act like
L$Name$one <-
> value
>     >> in cases when L did not have a component named
"Name" and value
>     >> had length 1.
>
>     > (I don't entirely get what you mean, but)
>     > indeed,
>     > the  [[<-  assignments will be closer to corresponding $<-
> assignments...
>     > which I thought would be another good thing about the change.
>
>     >> I have seen users use [[<- where [<- is more appropriate
in cases
> like
>     >> this.  Should there be a way to generate warnings about the
change
> in
>     >> behavior as you've done with other syntax changes?
>
>     > Well, good question.
>     > I'd guess one would get such warnings "all over the
place",  and
>     > if a warning is given only once per session it may not be
>     > effective  ... also the warning be confusing to the 99.9% of R
users
> who
>     > don't even get what we are talking about here ;-)
>
>     > Thank you for your comments.. I did not get too many.
>
> Well, there's one situation where semi-experienced package
> authors are bitten by the new R-devel behavior...
>
> I'm seeing a few dozen CRAN packages breaking in R-devel >= r77884.
>
> One case is exactly as you (Bill) mention above: people using
> dd[[.]] <- ..   where they should use single [.].
>
> In one package, I see an inefficient for loop over all rows of a
> data frame 'dd'
>
> for(i in 1:nrow(dd)) {
>
>  ...
>
>  dd$<nonexisting_column>[[i]] <-  <one character string>
>
> }
>
> This used to work -- as said quite inefficiently:
> for i=1 it created the **full** data frame column  and then,
> once the column exists, it presumably does assign one entry
> after the other...
>
> Now this code breaks (later!) in the package now, because the
> new column ends up as a *list* of strings, instead of a vector
> of strings.
>
> I think there are quite a few such cases also in other CRAN
> packages which now break with the latest R-devel.
>
> Coming back to Bill Dunlap's question: Should we not warn here?
> And now when our toplevel list is a data frame, maybe we should
> warn indeed, if we can easily limit ourselves to such "bizarre"
> ways of growng a data frame  ...
>
>
>   dd $ foo [[i]] <- vv
>
> <==>
>
>   `*tmp*` <- dd
>   dd <- `$<-`(`*tmp*`, value = `[[<-`(`*tmp*`$foo, i, vv))
>   rm(`*tmp*`)
>
> but then really we have the same problem as previously: The
>  `[[<-`(NULL, i, vv)  part does not "know" anything about the
> fact that we are in a data frame column creation context.
>
> If the R package author had used  '[i]' instead of '[[i]]'
> he|she would have been safe
>
> (as they would be if they worked more efficiently and created
> the whole variable as a vector and only then added it to the
> data frame ... but then, it seems people want to perpetuate the
> claim of R to be slow ... even if it's them who make R run
> slowly ... ;-))
>
>
	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more possibly parallel threads

R devel - Feb 2020 - Change 77844 breaking pkgs [Re: dimnames incoherence?]

[Rd] dimnames incoherence?

[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]

[Rd] Change 77844 breaking pkgs [Re: dimnames incoherence?]

Maybe Matching Threads