thr3ads.net - R help - [R] using non-ASCII strings in R packages [Jan 2007]

If this information is useful, please help other people find it:
Share via:

Bojanowski, M.J. (Michal)

2007-Jan-25 01:02 UTC

[R] using non-ASCII strings in R packages

Hello dear useRs and wizaRds,

I am currently developing a package that will enable to use administrative map
of Poland in R plots. Among other things I wanted to include region names in
proper Polish language so that they can be used in creating graphics etc. I am
working on Windows and when I build the package it is complaining about
non-ASCII characters R code files.

I was wondering what would be the best way to properly implement them in a
platform-independent way so that they can be used in computations as well as in
producing PS, PDF and other graphic output. Unfortunately I have a limited
knowledge of encoding schemes etc. Is it OK to include them in Windows-1250
encoding (default for Polish locale, as far as I know)? I believe this problem
is frequently confronted for other "non-latin1" languages. If it is
not the way to go, I would be very grateful for suggestions.

Thanks in advance
and kind regards,

Michal Bojanowski

____________________________________
Michal Bojanowski
ICS / Department of Sociology
Utrecht University
Heidelberglaan 2; 3584 CS Utrecht
Room 1428
m.j.bojanowski@fss.uu.nl
http://www.fss.uu.nl/soc/bojanowski/


	[[alternative HTML version deleted]]

Prof Brian Ripley

2007-Jan-25 09:17 UTC

head link

[R] using non-ASCII strings in R packages

On Thu, 25 Jan 2007, Bojanowski, M.J.  (Michal) wrote:
> Hello dear useRs and wizaRds,
>
> I am currently developing a package that will enable to use 
> administrative map of Poland in R plots. Among other things I wanted to 
> include region names in proper Polish language so that they can be used 
> in creating graphics etc. I am working on Windows and when I build the 
> package it is complaining about non-ASCII characters R code files.
>
> I was wondering what would be the best way to properly implement them in 
> a platform-independent way so that they can be used in computations as 
> well as in producing PS, PDF and other graphic output. Unfortunately I 
> have a limited knowledge of encoding schemes etc. Is it OK to include 
> them in Windows-1250 encoding (default for Polish locale, as far as I 
> know)? I believe this problem is frequently confronted for other 
> "non-latin1" languages.
Well, infrequently, and it has been answered a few times before (including 
in my talk at UseR 2006, 
http://www.r-project.org/useR-2006/Slides/Ripley.pdf).
> If it is not the way to go, I would be very grateful for suggestions.
Since a Japanese-language Windows machine cannot reproduce Polish 
non-ASCII characters, the portability you seek is not possible for reasons 
outside R.  And many other systems cannot plot in both Polish and their 
native language, or at least not in the same font.

ISOLatin2 is the standard 8-bit encoding for Polish: Windows CP1250 is a 
superset, AFAIR.  If all your users are using an 8-bit Polish locale, 
ISOLatin2 would be safe, but not otherwise.  Even then, there is no 
guarantee that the Polish characters would be in the fonts used in 
PostScript and PDF: some fonts only cover ISOLatin1.

There is one thing you can do to make this a little more portable (and 
avoid the warnings).  If you store the strings concerned in a text file in 
ISOLatin2, and read them into R at run time (e.g. when your package is 
loaded), you can make use of file(encoding=) or iconv() to convert them to 
the current encoding.  That will succeed in ISOLatin2 or CP1250 or UTF-8 
locales and fail otherwise.

Unfortunately that is not the end of the story for users of UTF-8 locales. 
as postscript() and pdf() do not support UTF-8 (as the graphics languages 
do not) and need to be told to use encoding="ISOLatin2.enc", and the
X11
system has a mind of its own and may not show non-ASCII characters in some 
fonts (or worse, render them incorrectly).

The use of Unicode was supposed to reduce the impact of Babel.  But
implementation split into two camps (Windows with UCS-2 and Unix-alikes 
with UTF-8) and some important players (e.g. Adobe) have ignored it, so it 
has only been a very partial solution.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jan 2007 - using non-ASCII strings in R packages

[R] using non-ASCII strings in R packages

[R] using non-ASCII strings in R packages

Seemingly Similar Threads