Thomas Zumbrunn
2012-Jan-18  22:54 UTC
[Rd] use of UTF-8 \uxxxx escape sequences in function arguments
While preparing a function that contained non-ASCII characters for inclusion 
into a package, I replaced all non-ASCII characters with UTF-8 escape 
sequences (using \uxxxx) in order to make the package portable (and adhere to 
"R CMD check"). What I didn't expect: when one uses UTF-8 escape
sequences in
function arguments, one needs to use UTF-8 escape sequences when calling the 
function, too - even when working in a UTF-8 locale. Is this an intended 
behaviour?
Here's an example to illustrate the (putative) problem:
   ## function that uses non-ASCII characters in arguments
   plain <- function(myarg = c("Basel", "Bern",
"Z?rich")) {
     myarg <- match.arg(myarg)
   }
   ## function that uses UTF-8 escape sequences in arguments
   escaped <- function(myarg = c("Basel", "Bern",
"Z\u00BCrich")) {
     myarg <- match.arg(myarg)
   }
   ## test
   plain("Z?rich")  ## works
   plain("Z\u00BCrich")  ## fails
   escaped("Z?rich")  ## fails
   escaped("Z\u00BCrich")  ## works
Thank you for your help.
Thomas Zumbrunn
> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    
LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
peter dalgaard
2012-Jan-18  23:17 UTC
[Rd] use of UTF-8 \uxxxx escape sequences in function arguments
On Jan 18, 2012, at 23:54 , Thomas Zumbrunn wrote:> plain("Z?rich") ## works > plain("Z\u00BCrich") ## fails > escaped("Z?rich") ## fails > escaped("Z\u00BCrich") ## worksUsing the correct UTF-8 code helps quite a bit: U+00BC ? c2 bc VULGAR FRACTION ONE QUARTER U+00FC ? c3 bc LATIN SMALL LETTER U WITH DIAERESIS -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com