thr3ads.net - R help - [R] Reading a txt file from internet [Sep 2024]

If this information is useful, please help other people find it:
Share via:

Duncan Murdoch

2024-Sep-07 21:43 UTC

[R] Reading a txt file from internet

On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help
wrote:> When you specify LE in the encoding type, you are logically telling the
decoder that you know the two-byte pairs are in little-endian order... which
could override whatever the byte-order-mark was indicating. If the BOM indicated
big-endian then the file decoding would break. If there is a BOM, don't
override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless
you really need it.
That sounds like good advice, but it doesn't work:

  > read.delim(
  +     'https://online.stat.psu.edu/onlinecourses/sites/stat501/files 
/ch15/employee.txt',
  +     fileEncoding = "UTF-16"
  + )
  [1] time 
 
 
 
 
 
 
 
 
 
 
 
 
 

  [2] 
vendor.?????........??........?.??........?.??.?..?.....?..?..?...?.?..?..?...?.??....?...?.??.

and so on.> 
> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann <es at
enricoschumann.net> wrote:
>> On Sun, 08 Sep 2024, Christofer Bogaso writes:
>>
>>> Hi,
>>>
>>> I am trying to the data from
>>>
https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt
>>> without any success. Below is the error I am getting:
>>>
>>>>
read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt')
>>>
>>> Error in make.names(col.names, unique = TRUE) :
>>>
>>>    invalid multibyte string at '<ff><fe>t'
>>>
>>> In addition: Warning messages:
>>>
>>> 1: In read.table(file = file, header = header, sep = sep, quote =
quote,  :
>>>
>>>    line 1 appears to contain embedded nulls
>>>
>>> 2: In read.table(file = file, header = header, sep = sep, quote =
quote,  :
>>>
>>>    line 2 appears to contain embedded nulls
>>>
>>> 3: In read.table(file = file, header = header, sep = sep, quote =
quote,  :
>>>
>>>    line 3 appears to contain embedded nulls
>>>
>>> 4: In read.table(file = file, header = header, sep = sep, quote =
quote,  :
>>>
>>>    line 4 appears to contain embedded nulls
>>>
>>> 5: In read.table(file = file, header = header, sep = sep, quote =
quote,  :
>>>
>>>    line 5 appears to contain embedded nulls
>>>
>>> Is there any way to read this data directly onto R?
>>>
>>> Thanks for your time
>>>
>>
>> The <ff><fe> looks like a byte-order mark
>> (https://en.wikipedia.org/wiki/Byte_order_mark).
>> Try this:
>>
>>     fn <-
file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
>>                encoding = "UTF-16LE")
>>     read.delim(fn)
>>
>

Jeff Newmiller

2024-Sep-07 23:37 UTC

head link

[R] Reading a txt file from internet

I tried it on R 4.4.1 on Linux Mint 21.3 just before I posted it, and I just
tried it on R 3.4.2 on Ubuntu 16.04 and R 4.3.2 on Windows 11 just now and it
works on all of them.

I don't have a big-endian machine to test on, but the Unicode spec says to
honor the BOM and if there isn't one to assume that it is big-endian data.
But in this case there is a BOM so your machine has a buggy decoder?

On September 7, 2024 2:43:24 PM PDT, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:>On 2024-09-07 4:52 p.m., Jeff Newmiller via R-help wrote:
>> When you specify LE in the encoding type, you are logically telling the
decoder that you know the two-byte pairs are in little-endian order... which
could override whatever the byte-order-mark was indicating. If the BOM indicated
big-endian then the file decoding would break. If there is a BOM, don't
override it unless you have to (e.g. for a wrong BOM)... leave off the LE unless
you really need it.
>
>That sounds like good advice, but it doesn't work:
>
> > read.delim(
> +     'https://online.stat.psu.edu/onlinecourses/sites/stat501/files
/ch15/employee.txt',
> +     fileEncoding = "UTF-16"
> + )
> [1] time 
>
>
>
>
>
>
>
>
>
>
>
>
>
> [2]
vendor.?????........??........?.??........?.??.?..?.....?..?..?...?.?..?..?...?.??....?...?.??.
>
>and so on.
>> 
>> On September 7, 2024 1:22:23 PM PDT, Enrico Schumann <es at
enricoschumann.net> wrote:
>>> On Sun, 08 Sep 2024, Christofer Bogaso writes:
>>> 
>>>> Hi,
>>>> 
>>>> I am trying to the data from
>>>>
https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt
>>>> without any success. Below is the error I am getting:
>>>> 
>>>>>
read.delim('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt')
>>>> 
>>>> Error in make.names(col.names, unique = TRUE) :
>>>> 
>>>>    invalid multibyte string at '<ff><fe>t'
>>>> 
>>>> In addition: Warning messages:
>>>> 
>>>> 1: In read.table(file = file, header = header, sep = sep, quote
= quote,  :
>>>> 
>>>>    line 1 appears to contain embedded nulls
>>>> 
>>>> 2: In read.table(file = file, header = header, sep = sep, quote
= quote,  :
>>>> 
>>>>    line 2 appears to contain embedded nulls
>>>> 
>>>> 3: In read.table(file = file, header = header, sep = sep, quote
= quote,  :
>>>> 
>>>>    line 3 appears to contain embedded nulls
>>>> 
>>>> 4: In read.table(file = file, header = header, sep = sep, quote
= quote,  :
>>>> 
>>>>    line 4 appears to contain embedded nulls
>>>> 
>>>> 5: In read.table(file = file, header = header, sep = sep, quote
= quote,  :
>>>> 
>>>>    line 5 appears to contain embedded nulls
>>>> 
>>>> Is there any way to read this data directly onto R?
>>>> 
>>>> Thanks for your time
>>>> 
>>> 
>>> The <ff><fe> looks like a byte-order mark
>>> (https://en.wikipedia.org/wiki/Byte_order_mark).
>>> Try this:
>>> 
>>>     fn <-
file('https://online.stat.psu.edu/onlinecourses/sites/stat501/files/ch15/employee.txt',
>>>                encoding = "UTF-16LE")
>>>     read.delim(fn)
>>> 
>> 
>
-- 
Sent from my phone. Please excuse my brevity.

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Sep 2024 - Reading a txt file from internet

[R] Reading a txt file from internet

[R] Reading a txt file from internet

Apparently Analagous Threads