thr3ads.net - R help - [R] Strange characters that block import [Oct 2009]

If this information is useful, please help other people find it:
Share via:

arnaud Mosnier

2009-Oct-14 12:25 UTC

[R] Strange characters that block import

Dear useRs,

I try to import a text file that contain some strange characters coming from
the misinterpretation of foreign language characters by another software
(see below).

----------------------------------------
Here is an example of text with a line containing characters that bug the
import

name;number
zdsfbg;2
 ;3
dtryjh;4

----------------------------------------

R do not want to import lines after those strange characters (i.e. import
only the first two lines, one is the header, the second the first line of
data).

I already try to import using other encoding such as latin1 or UTF-8 but it
does not solve the problem.

Replacing those character in a text editor before importing solve the
solution, but I want that the user of my script do not have to edit the text
before the analysis in R.

Any hint ??

Thanks

	[[alternative HTML version deleted]]

Duncan Murdoch

2009-Oct-14 14:26 UTC

head link

[R] Strange characters that block import

On 10/14/2009 8:25 AM, arnaud Mosnier wrote:> Dear useRs,
> 
> I try to import a text file that contain some strange characters coming
from
> the misinterpretation of foreign language characters by another software
> (see below).
> 
> ----------------------------------------
> Here is an example of text with a line containing characters that bug the
> import
> 
> name;number
> zdsfbg;2
>  ;3
> dtryjh;4
> 
> ----------------------------------------
> 
> R do not want to import lines after those strange characters (i.e. import
> only the first two lines, one is the header, the second the first line of
> data).
> 
> I already try to import using other encoding such as latin1 or UTF-8 but it
> does not solve the problem.
> 
> Replacing those character in a text editor before importing solve the
> solution, but I want that the user of my script do not have to edit the
text
> before the analysis in R.
> 
> Any hint ??
Those funny characters are octal 032, Ctrl-Z.  Years ago that was 
defined on DOS/Windows as an end of file marker, and I guess our code 
still honours that.

You can work around it by stating that you're reading from a binary 
file, not a text file:

f <- file("text.txt", "rb")

Then read.csv2(f) fails, but readLines(f) succeeds, so this works:

 > f <- file("c:/temp/test.txt", "rb")
 > read.csv2(textConnection(readLines(f)))
                name number
1            zdsfbg      2
2 \032\032 \032\032      3
3            dtryjh      4

 > close(f)

I don't know if there are any characters that would cause readLines to 
fail, but there might be, so I'd suggest replacing the buggy software 
that caused all the problems in the first place.

Duncan Murdoch


> 
> Thanks
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Search for more maybe matching threads

R help - Oct 2009 - Strange characters that block import

[R] Strange characters that block import

[R] Strange characters that block import

Reasonably Related Threads