Suharto Anggono Suharto Anggono
2017-Aug-01 16:54 UTC
[Rd] translateChar in NewName in bind.c
For the 2nd example, I say that R 3.4.1 result is acceptable, as names(c(x)) and names(x) are equal. The change exposed by the 2nd example is in line with statement of the NEWS item corresponding to PR#17284: "c() and unlist() are now more efficient in constructing the names(.) of their return value, ...." However, currently, the NEWS item is for R-devel, not R 3.4.1 patched. -------------------------------------------- On Mon, 31/7/17, Martin Maechler <maechler at stat.math.ethz.ch> wrote: Subject: Re: [Rd] translateChar in NewName in bind.c Cc: r-devel at r-project.org Date: Monday, 31 July, 2017, 8:38 PM>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org> >>>>> on Sun, 30 Jul 2017 14:57:53 +0000 writes:> R devel's bind.c has been ported to R patched. Is it OK while names of 'unlist' or 'c' result may be not strictly the same as in R 3.4.1 because of changed function 'NewName' in bind.c? > Using 'translateCharUTF8' instead of 'translateChar' is as it should be. It has an effect in non-UTF-8 locale for this example. > x <- list(1:2) > names(x) <- "\ue7" > res <- unlist(x) > charToRaw(names(res)[1]) > Directly assigning 'tag' to 'ans' is more efficient, but > may be different from in R 3.4.1 that involves > 'translateCharUTF8', that is also correct. It has an > effect for this example. > x <- 0 > names(x) <- "\xe7" > Encoding(names(x)) <- "latin1" > res <- c(x) > Encoding(names(res)) > charToRaw(names(res)) Yes, you are right, thank you: That part of the changes in bind.c was *not* directly related to the two R-bugs (PR#17284 & PR#17292)... and therefore, maybe I should not have ported it to R-patched (= R 3.4.1 patched). Your examples above are instructive.. notably the 2nd one seems to demonstrate to me, that the change also *did* fix a bug: Encoding(names(res)) is "latin1" in R-devel but interestingly is "UTF-8" in R 3.4.1, indeed independently of the locale. I would argue R-devel (and current R-patched) is more faithful by keeping the Encoding "latin1" that was set for names(x) also in the names(c(x)) . I could revert R-patched's bind.c (so it only contains the two official bug fixes PR#172(84|92) but I wonder if it is desirable in this case. I'm glad for further reasoning. Given current "knowledge"/"evidence", I would not revert R-patched to R 3.4.1's behavior. Martin