thr3ads.net - R devel - [Rd] String encoding problem [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Hadley Wickham

2016-Jul-07 14:57 UTC

[Rd] String encoding problem

If you print:

"\xc9\x82\xbf"

you get

 "\u0242\xbf"

But if you try and evaluate that string you get:
>  "\u0242\xbf"Error: mixing Unicode and octal/hex escapes in a string is not allowed

(Probably will only happen on mac/linux with default utf-8 encoding)

Hadley

-- 
http://hadley.nz

Duncan Murdoch

2016-Jul-07 15:11 UTC

head link

[Rd] String encoding problem

On 07/07/2016 10:57 AM, Hadley Wickham wrote:> If you print:
>
> "\xc9\x82\xbf"
>
> you get
>
>  "\u0242\xbf"
>
> But if you try and evaluate that string you get:
>
>>  "\u0242\xbf"
> Error: mixing Unicode and octal/hex escapes in a string is not allowed
>
> (Probably will only happen on mac/linux with default utf-8 encoding)
I'm not sure what should happen here, but that's not a legal string in a
UTF-8 locale, so it's not too surprising that things go wonky.

Duncan Murdoch

Hadley Wickham

2016-Jul-07 15:40 UTC

head link

[Rd] String encoding problem

On Thu, Jul 7, 2016 at 10:11 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:> On 07/07/2016 10:57 AM, Hadley Wickham wrote:
>>
>> If you print:
>>
>> "\xc9\x82\xbf"
>>
>> you get
>>
>>  "\u0242\xbf"
>>
>> But if you try and evaluate that string you get:
>>
>>>  "\u0242\xbf"
>>
>> Error: mixing Unicode and octal/hex escapes in a string is not allowed
>>
>> (Probably will only happen on mac/linux with default utf-8 encoding)
>
>
> I'm not sure what should happen here, but that's not a legal string
in a
> UTF-8 locale, so it's not too surprising that things go wonky.
Here's bit more context on how I got that sequence of bytes:

x <- "?????"
y <- iconv(x, to = "Shift-JIS")
Encoding(y)
y

I did this to create an example to demonstrate how to handle encoding
problems, and it's bit frustrating that I have to manually mangle the
string in order to be able to re-use it in another session.  Maybe
strings with unknown encoding shouldn't use unicode escapes?

Hadley

-- 
http://hadley.nz

R devel - Jul 2016 - String encoding problem

[Rd] String encoding problem

[Rd] String encoding problem

[Rd] String encoding problem