This can be narrowed down to
Sys.setlocale("LC_CTYPE","C")
x2 <- "\u00e7"
x1 <- iconv(x2, from="UTF-8", to="latin1")
x1 < x2 # FALSE or NA
In R 4.0 it returns NA, in R-devel it returns FALSE (when running in
CP1252 locale on Windows).
It is the same character, only the encoding is different, so the R-devel
return value is correct and the previous behavior was a bug. It should
not matter what is the current native encoding when doing the
comparison. Also, the collation order should only apply after characters
are converted to a common encoding, when the encoding is known, so in
this case the collation order of the locale should not have an impact,
and it seems it doesn't. I don't think R should preserve
bug-compatibility in this case, code depending on this buggy behavior
should be fixed.
I don't see immediately which NEWS entry this corresponds to. Please
keep in mind that NEWS don't cover all changes, for that you need to
look at the svn commits, and even then it may be hard to track down
concrete changes in behavior to the commits, to do that you need to
debug the code or bisect.
Changes to _documented_ behavior should be more visible and of course
reflected by changes in the documentation, if not, it is a bug worth
reporting,? and the report should come with a reference to concrete
parts of the documentation that is violated.
Best
Tomas
On 5/23/20 12:03 PM, Jan Gorecki wrote:> Hi R developers,
> There seems to be breaking change in base::order on Windows in
> R-devel. Code below yields different results on R 4.0.0 and R-devel
> (2020-05-22 r78545). I haven't found any info about that change in
> NEWS. Was the change intentional?
>
> Sys.setlocale("LC_CTYPE","C")
> Sys.setlocale("LC_COLLATE","C")
> x1 = "fa\xE7ile"
> Encoding(x1) = "latin1"
> x2 = iconv(x1, "latin1", "UTF-8")
> base::order(c(x2,x1,x1,x2))
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
>
> # R 4.0.0
> base::order(c(x2,x1,x1,x2))
> #[1] 1 4 2 3
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
> #[1] 2 3 1 4
>
> # R-devel
> base::order(c(x2,x1,x1,x2))
> #[1] 1 2 3 4
> Encoding(x2) = "unknown"
> base::order(c(x2,x1,x1,x2))
> #[1] 1 4 2 3
>
> Best Regards,
> Jan Gorecki
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel