Prof Brian Ripley
2007-Feb-16 07:26 UTC
[Rd] R-devel news: non-ASCII character strings in packages
R-devel (pre-2.5.0) now has enough facilities to allow packages with non-ASCII character strings to work reasonably well in locales where the fonts use support the characters used. For example, names in Western European languages can be used on both Latin-1 (and hence Windows 1252) and UTF-8 systems. It should also be possible to make use of non-ASCII object names. To enable this, two things need to be done. 1) The package encoding needs to be declared in the DESCRIPTION file. 2) Any character strings stored in .rda files need to be marked as Latin-1 or UTF-8 (see 'Writing R Extensions' for how to do so). R CMD check will give NOTE or WARNING messages when it detects non-ASCII characters. Please do bear in mind the caveat in the first paragraph: it is very unlikely that using French in a Chinese locale or v.v. will work correctly (even on a UTF-8 system). The changes needed are backwards compatible: if you make them to your package, it will work equally well (or badly) in R < 2.5.0, and better in 2.5.0 when released. Currently one CRAN package has non-ASCII object names and fifteen have non-ASCII data (as detected by R CMD check). Note that non-ASCII data need not be from non-English languages: Windows 1252 in particular has a variety of signs that are far from portable, most notably its misnamed 'smart quotes' (but also the Euro). Finally, please do not add Encoding: to the DESCRIPTION of ASCII-only packages. It just slows things down and (unless latin1 is specified) restricts the package to only systems supporting iconv. (Yes, there are examples of this.) -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595