thr3ads.net - R help - [R] charToRaw("Œ") is not 8C in R console [Dec 2013]

If this information is useful, please help other people find it:
Share via:

水静流深

2013-Dec-13 07:03 UTC

[R] charToRaw("Œ") is not 8C in R console

in http://www.ascii-code.com/, you can see the the hex value of Œ is 8C,






why in my R console ?
charToRaw("Œ")
 [1] c5 92
 is not 8C ?
	[[alternative HTML version deleted]]

Prof Brian Ripley

2013-Dec-13 07:59 UTC

head link

[R] charToRaw("Œ") is not 8C in R console

On 13/12/2013 07:03, ???? wrote:> in http://www.ascii-code.com/, you can see the the hex value of ?? is 8C,
I don't see that: that is two characters and they are C5 and 92 in that 
table.  8C is a AE ligature, there.

And what the 'hex value' is depends on the locale: see the preamble of 
that table (which seems to assume everyone uses CP1252): you have not 
stated yours.
> why in my R console ?
> charToRaw("??")
>   [1] c5 92
>   is not 8C ?
Because R is better at looking up hex values than you are.

I get

 > charToRaw("??")
[1] c3 85 e2 80 99

in UTF-8 (as will almost everyone not using Windows).

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

peter dalgaard

2013-Dec-13 09:07 UTC

head link

[R] charToRaw("Œ") is not 8C in R console

On 13 Dec 2013, at 08:03 , ???? <1248283536 at qq.com> wrote:
> in http://www.ascii-code.com/, you can see the the hex value of ? is 8C,
> 
(Looks like Brian got his version mangled in transmission.)

Anything above 7F is not ASCII.

Various "8-bit extensions" put various non-ASCII characters at various
places in the range 80-FF. Your reference shows the Latin-1 encoding which
covers the Western European languages. That was useful for a while [*], until
the West and the East began talking to eachother and found that the other
party's documents were putting different characters in the same places of
different encodings.

UTF-8 uses multibyte sequences like c5 92 to represent extra characters, which
allows you to have more than 128 of them.

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=256
http://www.joelonsoftware.com/articles/Unicode.html

-pd

[*] A short while, actually, because it was preceded by another encoding mess
known as IBM Code Pages. Famously, in this country, IBM computers (and many 3rd
party printers!) shipped with a code page missing the O-slash Danish character
which got printed as "cent"/"Yen"!

> 
> 
> 
> 
> 
> why in my R console ?
> charToRaw("?")
> [1] c5 92
> is not 8C ?
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Dec 2013 - charToRaw("Œ") is not 8C in R console

[R] charToRaw("Œ") is not 8C in R console

[R] charToRaw("Œ") is not 8C in R console

[R] charToRaw("Œ") is not 8C in R console