You seem never to have told R or us what charset these data are in. I
think it is likely that they are being transferred in latin2 (like your
email), and you are running R in UTF-8 according to Sys.getlocale. So
what you need to do is to either
1) Run R in latin2
or
2) use iconv() to convert the results from latin2 to UTF-8.
UTF-8 is relatively new in the DBMS world. For ODBC, look at the bug
reports on the MySQL site. Using RODBC with UTF-8 locales is on my TODO
list, but of no urgency at all.
On Sun, 18 Feb 2007, janek0 wrote:
> Dear List
>
> In my short life as a beginning R-user i've encountered a following
> problem that i'm unable to solve myself:
>
> I have a database in MySQL containing table and field names as well as
> some data containing Polish accentuated characters (like ????),
> utf8-encoded. It works just fine with just any external query browser i
> can find, jdbc, odbc, native, whatever. Also mysql is happy about my
> charset. It also seems to be configured correctly:
>
> mysql> status
> --------------
> mysql Ver 14.12 Distrib 5.0.24a, for pc-linux-gnu (i486) using readline
> 5.1
>
> Connection id: 16
> Current database:
> Current user: root at localhost
> SSL: Not in use
> Current pager: stdout
> Using outfile: ''
> Using delimiter: ;
> Server version: 5.0.24a-Debian_9-log
> Protocol version: 10
> Connection: Localhost via UNIX socket
> Server characterset: utf8
> Db characterset: utf8
> Client characterset: utf8
> Conn. characterset: utf8
> UNIX socket: /var/run/mysqld/mysqld.sock
> Uptime: 1 hour 37 min 17 sec
>
> Threads: 2 Questions: 240 Slow queries: 0 Opens: 175 Flush tables: 1
> Open tables: 64 Queries per second avg: 0.041
>
> Yet if i use R's RODBC or RMySQL to connect to my database i can't
see
> these accentuated characters:
>
> library(RODBC)
> con <-odbcConnect("trybunal", uid="root",
pwd="mypassword")
> sqlTable(con)
>
> the output (abridged) is like that:
>
> TABLE_CAT TABLE_SCHEM TABLE_NAME TABLE_TYPE
> REMARKS
> 14 trybunal Wyk?adnia innych przepis<f3>w TABLE
MySQL
>
> instead of <F3> i should see "?" and instead of ? a
"?".
>
> It is just the same if i use RMySQL instead of RODBC:
>
> library(RMySQL)
> con <-dbConnect(dbDriver("MySQL"),
dbname="trybunal", username="root",
> password="mypassword")
> dbListTables(con)
>
> the output (abridged) is like that:
> [13] "Ustawa" "Wyk?adnia innych
przepis<f3>w"
>
> and if i use
> dbReadTable(con, "Metryczka")
>
> ("Metryczka" being one table in the database) i get:
>
> Error in make.names(as.character(names), allow_) :
> invalid multibyte string 11
>
> It works without error if i set LC_ALL to "C", but obviously
without
> Polish charset.
>
> Strange thing is that Sys.getlocale() gives me
>
>
[1]"LC_CTYPE=pl_PL.UTF-8;LC_NUMERIC=C;LC_TIME=pl_PL.UTF-8;LC_COLLATE=pl_PL.UTF-8;
> LC_MONETARY=pl_PL.UTF-8;LC_MESSAGES=pl_PL.UTF-8;
>
LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C"
>
> So it is utf8 all over the place. Also R works just fine with
> read.table() if the table contains utf8-encoded chars. Thus the problem
> is just with R-mysql connection. It seems therefore that dbi package
> does not support non-ascii charsets.
>
> Questions:
> 1. Is above conclusion correct or am i doing something wrong ?
> 2. If it is correct, is there any way to use table and field names as
> they are now (with non-ascii chars) in my SQL queries (e.g. SELECT
> `Wyk?adnia przedmiotu kontroli`.*) ?
>
> I can live with Polish characters missing in the output if i have to but
> i must address the database fields/tables. I can't change their names
of
> as this would mean rebuilding database frontend. I can't import data to
> R by exporting the database and then importing it via read.table because
> i want a "live" application and must keep things simple.
>
> Any help will be greatly appreciated.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595