On 2017-05-24, Duncan Murdoch wrote:> > I think the test is wrong because in the first case you are working in a > locale where that character is representable. In my locale it is not, so x1 > is converted to UTF-8, and everything compares equal. > > An explicit conversion of x1 to UTF-8 should fix this, i.e. replace > > x1 <- path.expand(paste0("~/", filename)) > > with > > x1 <- enc2utf8(path.expand(paste0("~/", filename))) > > Could you try this and see if it helps?Nope:> ## path.expand shouldn't translate to local encoding PR#17120 > filename <- "\U9b3c.R" > > x11 <- path.expand(paste0("~/", filename)) > print(Encoding(x11))[1] "unknown"> x12 <- enc2utf8( path.expand(paste0("~/", filename)) ) > print(Encoding(x12))[1] "unknown"> x2 <- paste0(path.expand("~/"), filename) > print(Encoding(x2))[1] "UTF-8"> > #stopifnot(identical(path.expand(paste0("~/", filename)), > stopifnot(identical(enc2utf8( path.expand(paste0("~/", filename)) ),+ paste0(path.expand("~/"), filename))) Error: identical(enc2utf8(path.expand(paste0("~/", filename))), paste0(path.expand("~/"), .... is not TRUE Execution halted I forgot to report:> sessionInfo()R Under development (unstable) (2017-05-23 r72721) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Debian GNU/Linux 9 (stretch) Matrix products: default BLAS: /usr/local/share/R-devel/lib/libRblas.so LAPACK: /usr/local/share/R-devel/lib/libRlapack.so locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.5.0 h. -- +--- | Hiroyuki Kawakatsu | Business School, Dublin City University | Dublin 9, Ireland. Tel +353 (0)1 700 7496
On 24/05/2017 7:59 AM, Hiroyuki Kawakatsu wrote:> On 2017-05-24, Duncan Murdoch wrote: >> >> I think the test is wrong because in the first case you are working in a >> locale where that character is representable. In my locale it is not, so x1 >> is converted to UTF-8, and everything compares equal. >> >> An explicit conversion of x1 to UTF-8 should fix this, i.e. replace >> >> x1 <- path.expand(paste0("~/", filename)) >> >> with >> >> x1 <- enc2utf8(path.expand(paste0("~/", filename))) >> >> Could you try this and see if it helps? > > Nope:Okay, how about if we weaken the test? Instead of stopifnot(identical(path.expand(paste0("~/", filename)), paste0(path.expand("~/"), filename))) try stopifnot(path.expand(paste0("~/", filename)) = paste0(path.expand("~/"), filename)) Duncan Murdoch> >> ## path.expand shouldn't translate to local encoding PR#17120 >> filename <- "\U9b3c.R" >> >> x11 <- path.expand(paste0("~/", filename)) >> print(Encoding(x11)) > [1] "unknown" >> x12 <- enc2utf8( path.expand(paste0("~/", filename)) ) >> print(Encoding(x12)) > [1] "unknown" >> x2 <- paste0(path.expand("~/"), filename) >> print(Encoding(x2)) > [1] "UTF-8" >> >> #stopifnot(identical(path.expand(paste0("~/", filename)), >> stopifnot(identical(enc2utf8( path.expand(paste0("~/", filename)) ), > + paste0(path.expand("~/"), filename))) > Error: identical(enc2utf8(path.expand(paste0("~/", filename))), paste0(path.expand("~/"), .... is not TRUE > Execution halted > > I forgot to report: > >> sessionInfo() > R Under development (unstable) (2017-05-23 r72721) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Debian GNU/Linux 9 (stretch) > > Matrix products: default > BLAS: /usr/local/share/R-devel/lib/libRblas.so > LAPACK: /usr/local/share/R-devel/lib/libRlapack.so > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.5.0 > > h. >
On 2017-05-24, Duncan Murdoch wrote: [...]> Okay, how about if we weaken the test?[...]> try > > stopifnot(path.expand(paste0("~/", filename)) => paste0(path.expand("~/"), filename)) >Nope:> ## path.expand shouldn't translate to local encoding PR#17120 > filename <- "\U9b3c.R" > > #stopifnot(identical(path.expand(paste0("~/", filename)), > stopifnot(path.expand(paste0("~/", filename)) ==+ paste0(path.expand("~/"), filename)) Error: path.expand(paste0("~/", filename)) == paste0(path.expand("~/"), .... is not TRUE Execution halted The problem is that path.expand(), or do_pathexpand() for non-windoze calls translateChar() which in turn calls translateToNative() which is unknown to make check (but not to R --vanilla) under my setup. Once it is unknown, there seems to be no way to force an encoding:> ## path.expand shouldn't translate to local encoding PR#17120 > filename <- "\U9b3c.R" > print(Encoding(filename))[1] "UTF-8"> > y1 <- paste0("~/", filename) > print(Encoding(y1))[1] "UTF-8"> > y2 <- path.expand(y1) > print(Encoding(y2))[1] "unknown"> > y3a <- iconv(y2, to="UTF-8") > print(Encoding(y3a))[1] "unknown"> > y3b <- enc2utf8(y2) > print(Encoding(y3b))[1] "unknown"> > Encoding(y2) <- "UTF-8" > print(Encoding(y2))[1] "unknown">h. -- +--- | Hiroyuki Kawakatsu | Business School, Dublin City University | Dublin 9, Ireland. Tel +353 (0)1 700 7496