On 18/04/2009 1:18 PM, Hilmar Berger wrote:> Hi all,
>
> I have problems reading Unicode (UTF-16) coded tables in R 2.8.1 under
> Windows Vista.
>
> Imagine the following table:
>
> a b c d
> X 1,2 1,3 1,4
> Y 2,2 2,3 2,4
> Z 3,2 3,3 3,4
>
> Usually I would use the following code to read the table:
>
> t = read.table("test.txt", header=T,
sep="\t",dec=",")
>
> This works well if I create the table using Notepad (the text will be in
> UTF-8 or ASCII, then).
I haven't tried 2.8.1 (which is obsolete, since yesterday :-), but in
2.9.0 it works fine if I use the fileEncoding argument to read.table.
Duncan Murdoch
> However, If I use e.g. OpenOffice scalc to create a spreadsheet holding
> the same data and save this data as text (using tabs as separators, no
> quotes and using Unicode encoding) the command above gives this:
>
> > t = read.table("test.csv", header=T,
sep="\t",dec=",")
> > t
> ??a
> 1 NA
> 2 NA
> 3 NA
>
> I tried to play with the "encoding" parameter but that would not
change
> anything.
>
> The file from OpenOffice is in UTF-16, as shown by hexdump:
> $ hexdump test.csv
> 0000000 feff 0061 0009 0062 0009 0063 0009 0064
> 0000010 000d 000a 0058 0009 0031 002c 0032 0009
> 0000020 0031 002c 0033 0009 0031 002c 0034 000d
> 0000030 000a 0059 0009 0032 002c 0032 0009 0032
> 0000040 002c 0033 0009 0032 002c 0034 000d 000a
> 0000050 005a 0009 0033 002c 0032 0009 0033 002c
> 0000060 0033 0009 0033 002c 0034 000d 000a
> 000006e
>
> I tried to read the file using file/readLines, which seemed to work
> after specifying the encoding:
>
> > a = file("test.csv",open="r",
encoding="UTF-16")
> > b = readLines(a)
> > b
> [1] "a\tb\tc\td" "X\t1,2\t1,3\t1,4"
"Y\t2,2\t2,3\t2,4"
> "Z\t3,2\t3,3\t3,4"
>
> Looking at the code of readtable.R in R-2.8.1. and R-2.9.0 it seems that
> the encoding does not get passed through in the second call to scan()
> appearing in the code.
>
> I'm not sure if this is a bug or if I'm doing something wrong here.
>
> Regards,
> Hilmar
>
> ------------------
> My system and R settings are:
>
> > sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
>
LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> loaded via a namespace (and not attached):
> [1] tools_2.8.1
>
> > Sys.info()
> sysname
> release version nodename
> "Windows"
"Vista" "build 6001,
> Service Pack 1" "PC"
> machine
> login user
> "x86"
>
> > options("encoding")
> $encoding
> [1] "native.enc"
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.