On 2024-09-07 7:37 p.m., Jeff Newmiller wrote:> I tried it on R 4.4.1 on Linux Mint 21.3 just before I posted it, and I
just tried it on R 3.4.2 on Ubuntu 16.04 and R 4.3.2 on Windows 11 just now and
it works on all of them.
>
> I don't have a big-endian machine to test on, but the Unicode spec says
to honor the BOM and if there isn't one to assume that it is big-endian
data. But in this case there is a BOM so your machine has a buggy decoder?
Sounds like it! I did it on a Mac running R 4.4.1.
Duncan Murdoch
>
> On September 7, 2024 2:43:24 PM PDT, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>> On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help wrote:
>>> When you specify LE in the encoding type, you are logically telling
the decoder that you know the two-byte pairs are in little-endian order... which
could override whatever the byte-order-mark was indicating. If the BOM indicated
big-endian then the file decoding would break. If there is a BOM, don't
override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless
you really need it.
>>
>> That sounds like good advice, but it doesn't work:
>>
>>> read.delim(
>> +
'https://online.stat.psu.edu/onlinecourses/sites/stat501/files
/ch15/employee.txt',
>> + fileEncoding = "UTF-16"
>> + )
>> [1] time
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> [2]
vendor.?????........??........?.??........?.??.?..?.....?..?..?...?.?..?..?...?.??....?...?.??.
>>
>> and so on.
>>>
>>> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann <es at
enricoschumann.net> wrote:
>>>> On Sun, 08 Sep 2024, Christofer Bogaso writes:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to the data from
>>>>>
https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt
>>>>> without any success. Below is the error I am getting:
>>>>>
>>>>>>
read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt')
>>>>>
>>>>> Error in make.names(col.names, unique = TRUE) :
>>>>>
>>>>> invalid multibyte string at
'<ff><fe>t'
>>>>>
>>>>> In addition: Warning messages:
>>>>>
>>>>> 1: In read.table(file = file, header = header, sep = sep,
quote = quote, :
>>>>>
>>>>> line 1 appears to contain embedded nulls
>>>>>
>>>>> 2: In read.table(file = file, header = header, sep = sep,
quote = quote, :
>>>>>
>>>>> line 2 appears to contain embedded nulls
>>>>>
>>>>> 3: In read.table(file = file, header = header, sep = sep,
quote = quote, :
>>>>>
>>>>> line 3 appears to contain embedded nulls
>>>>>
>>>>> 4: In read.table(file = file, header = header, sep = sep,
quote = quote, :
>>>>>
>>>>> line 4 appears to contain embedded nulls
>>>>>
>>>>> 5: In read.table(file = file, header = header, sep = sep,
quote = quote, :
>>>>>
>>>>> line 5 appears to contain embedded nulls
>>>>>
>>>>> Is there any way to read this data directly onto R?
>>>>>
>>>>> Thanks for your time
>>>>>
>>>>
>>>> The <ff><fe> looks like a byte-order mark
>>>> (https://en.wikipedia.org/wiki/Byte_order_mark).
>>>> Try this:
>>>>
>>>> fn <-
file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
>>>> encoding = "UTF-16LE")
>>>> read.delim(fn)
>>>>
>>>
>>
>