On Thu, Jul 7, 2016 at 10:11 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:> On 07/07/2016 10:57 AM, Hadley Wickham wrote:
>>
>> If you print:
>>
>> "\xc9\x82\xbf"
>>
>> you get
>>
>> "\u0242\xbf"
>>
>> But if you try and evaluate that string you get:
>>
>>> "\u0242\xbf"
>>
>> Error: mixing Unicode and octal/hex escapes in a string is not allowed
>>
>> (Probably will only happen on mac/linux with default utf-8 encoding)
>
>
> I'm not sure what should happen here, but that's not a legal string
in a
> UTF-8 locale, so it's not too surprising that things go wonky.
Here's bit more context on how I got that sequence of bytes:
x <- "?????"
y <- iconv(x, to = "Shift-JIS")
Encoding(y)
y
I did this to create an example to demonstrate how to handle encoding
problems, and it's bit frustrating that I have to manually mangle the
string in order to be able to re-use it in another session. Maybe
strings with unknown encoding shouldn't use unicode escapes?
Hadley
--
http://hadley.nz