thr3ads.net - R help - [R] Plotting the ASCII character set. [Jul 2021]

If this information is useful, please help other people find it:
Share via:

Ivan Krylov

2021-Jul-03 07:40 UTC

[R] Plotting the ASCII character set.

Hello Rolf Turner,

On Sat, 3 Jul 2021 14:02:59 +1200
Rolf Turner <r.turner at auckland.ac.nz> wrote:
> Can anyone suggest how I might get my plot_ascii() function working
> again?  Basically, it seems to me, the question is:  how do I persuade
> R to read in "\260" as "\ub0" rather than
"\xb0"?
Part of the problem is that the "\xb0" byte is not in ASCII, which
covers only the lower half of possible 8-bit bytes. I guess that the
strings containing bytes with highest bit set used to be interpreted as
Latin-1 on your machine, but now get interpreted as UTF-8, which
changes their meaning (in UTF-8, the highest bit being set indicates
that there will be more bytes to follow, making the string invalid if
there is none).

The good news is, since it's Latin-1, which is natively supported by R,
there are even multiple options:

1. Mark the string as Latin-1 by setting Encoding(a) <- 'latin1' and
let R do the re-encoding if and when Pango asks it for a UTF-8-encoded
string.

2. Decode Latin-1 into the locale encoding by using iconv(a, 'latin1',
'') (or set the third parameter to 'UTF-8', which would give
almost the
same result on a machine with a UTF-8 locale). The result is, again, a
string where Encoding(a) matches the truth. Explicitly setting UTF-8
may be preferable on Windows machines running pre-UCRT builds of R
where the locale encoding may not contain all Latin-1 characters, but
that's not a problem for you, as far as I know.

For any encoding other than Latin-1 or UTF-8, option (2) is still valid.

I have verified that your example works on my GNU/Linux system with a
UTF-8 locale if I use either option.

-- 
Best regards,
Ivan

Rolf Turner

2021-Jul-04 01:59 UTC

head link

[R] Plotting the ASCII character set.

On Sat, 3 Jul 2021 09:40:28 +0200
Ivan Krylov <krylov.r00t at gmail.com> wrote:
> Hello Rolf Turner,
> 
> On Sat, 3 Jul 2021 14:02:59 +1200
> Rolf Turner <r.turner at auckland.ac.nz> wrote:
> 
> > Can anyone suggest how I might get my plot_ascii() function working
> > again?  Basically, it seems to me, the question is:  how do I
> > persuade R to read in "\260" as "\ub0" rather than
"\xb0"?
> 
> Part of the problem is that the "\xb0" byte is not in ASCII,
which
> covers only the lower half of possible 8-bit bytes. I guess that the
> strings containing bytes with highest bit set used to be interpreted
> as Latin-1 on your machine, but now get interpreted as UTF-8, which
> changes their meaning (in UTF-8, the highest bit being set indicates
> that there will be more bytes to follow, making the string invalid if
> there is none).
> 
> The good news is, since it's Latin-1, which is natively supported by
> R, there are even multiple options:
> 
> 1. Mark the string as Latin-1 by setting Encoding(a) <- 'latin1'
and
> let R do the re-encoding if and when Pango asks it for a UTF-8-encoded
> string.
> 
> 2. Decode Latin-1 into the locale encoding by using iconv(a,
'latin1',
> '') (or set the third parameter to 'UTF-8', which would
give almost
> the same result on a machine with a UTF-8 locale). The result is,
> again, a string where Encoding(a) matches the truth. Explicitly
> setting UTF-8 may be preferable on Windows machines running pre-UCRT
> builds of R where the locale encoding may not contain all Latin-1
> characters, but that's not a problem for you, as far as I know.
> 
> For any encoding other than Latin-1 or UTF-8, option (2) is still
> valid.
> 
> I have verified that your example works on my GNU/Linux system with a
> UTF-8 locale if I use either option.
Thanks Ivan. That solves most of the problem, but there are still
glitches. I get a plot OK, but a substantial number of the characters
are displayed as a wee rectangle containing a 2 x 2 array of digits
such as
>   0 0
>   8 0
Also note that there is a bit of difference between the results of using
Encoding() and the results of using iconv(). E.g. if I do

a <- "\x80"
b <- iconv(a,"latin1","UTF-8")
Encoding(a) <- "latin1"

then when I type "a" I get the Euro symbol "?", but when I
type "b"
I get the string "\u0080".

But that doesn't really matter.  More problematic is the fact that if I
do either

    plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
    text(0.5,0.5,labels=a,cex=6)
or

    plot(0,0,type="n",xlim=c(0,1),ylim=c(0,1),ann=FALSE,axes=FALSE)
    text(0.5,0.5,labels=b,cex=6)

then I get wee rectangle with 0 0 8 0 arranged in a 2 x 2 array inside.
(Setting cex=6 makes it easier for my ageing eyes to see what the
digits are.)

Is there any way that I can get the Euro symbol to display correctly in
such a graphic?

Thanks.

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

R help - Jul 2021 - Plotting the ASCII character set.

[R] Plotting the ASCII character set.

[R] Plotting the ASCII character set.