thr3ads.net - R help - [R] Help Read File With Odd Characters [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Lee Hachadoorian

2012-Nov-08 07:11 UTC

[R] Help Read File With Odd Characters

I have a large (105MB) data file, tab-delimited with a header. There are 
some odd characters at the beginning of the file that are preventing it 
from being read by R.

 > dfTemp = read.delim(filename)
Error in make.names(col.names, unique = TRUE) :
invalid multibyte string at '<ff><fe>m'

When I view the file with head, I see:

??muni_code parcel_id?

The file is too large to edit in a graphical text editor (gedit). I 
tried just dropping the header row with

sed '1 d' <old.txt >new.txt"

but then

 > dfTemp = read.delim(filename)
Error in read.table(file = file, header = header, sep = sep, quote = 
quote, :
empty beginning of file

I tried some other shenanigans with sed (with which I am not really 
experienced) but did not get a usable file. Does anyone have any ideas 
for how to (a) directly read this into R, skipping the offending line or 
characters, or (b) preprocess it so that I can read it into R?

Best,
--Lee

R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Linux Mint 13

-- 
Lee Hachadoorian
Assistant Professor in Geography, Dartmouth College
http://freecity.commons.gc.cuny.edu

Prof Brian Ripley

2012-Nov-08 07:51 UTC

head link

[R] Help Read File With Odd Characters

On 08/11/2012 07:11, Lee Hachadoorian wrote:> I have a large (105MB) data file, tab-delimited with a header. There are
> some odd characters at the beginning of the file that are preventing it
> from being read by R.
>
>  > dfTemp = read.delim(filename)
> Error in make.names(col.names, unique = TRUE) :
> invalid multibyte string at '<ff><fe>m'
>
> When I view the file with head, I see:
>
> ??muni_code parcel_id?
>
> The file is too large to edit in a graphical text editor (gedit). I
> tried just dropping the header row with
>
> sed '1 d' <old.txt >new.txt"
>
> but then
>
>  > dfTemp = read.delim(filename)
> Error in read.table(file = file, header = header, sep = sep, quote >
quote, :
> empty beginning of file
>
> I tried some other shenanigans with sed (with which I am not really
> experienced) but did not get a usable file. Does anyone have any ideas
> for how to (a) directly read this into R, skipping the offending line or
> characters, or (b) preprocess it so that I can read it into R?
That is a BOM make in UCS-2 encoding.  Was this file created on Windows?

It so try using iconv to convert it to UTF-8, or in R use

read.delim(filename, fileEncoding = "UCS-2LE")

>
> Best,
> --Lee
>
> R version 2.14.1 (2011-12-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Linux Mint 13
Yes, but what locale?  See the 'at a minimum' information asked for in 
your posting guide.



-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Lee Hachadoorian

2012-Nov-08 22:13 UTC

head link

[R] Help Read File With Odd Characters

On 11/08/2012 02:51 AM, Prof Brian Ripley wrote:> On 08/11/2012 07:11, Lee Hachadoorian wrote:
>> I have a large (105MB) data file, tab-delimited with a header. There
are
>> some odd characters at the beginning of the file that are preventing it
>> from being read by R.
>>
> That is a BOM make in UCS-2 encoding.  Was this file created on Windows?
>
> It so try using iconv to convert it to UTF-8, or in R use
>
> read.delim(filename, fileEncoding = "UCS-2LE")
Perfect. I tried it both ways, and both iconv and the fileEncoding 
parameter did the trick.

As far as I know the file (which was provided by a public agency) was 
created in Windows.

Thanks,
--Lee

-- 
Lee Hachadoorian
Assistant Professor in Geography, Dartmouth College
http://freecity.commons.gc.cuny.edu

Maybe Matching Threads

Search for more seemingly similar threads

R help - Nov 2012 - Help Read File With Odd Characters

[R] Help Read File With Odd Characters

[R] Help Read File With Odd Characters

[R] Help Read File With Odd Characters

Maybe Matching Threads