Hi all I have to generate some test data for import in an sql database. The database is meant for web-based data entry in a study taking place in a german speaking region, so factor levels of the variables include umlauts. The variables in the dataframe t.muster are generated e.g. like this: t.muster$screening <- rep("ausgef?llt",50) and exported to a .csv file by: write.table(t.muster,"MakeMuster041006/MusterDaten.csv", col.names=FALSE,row.names=FALSE,na="",sep=";") After export the factor level including an umlaut of t.muster$screening look like this in the sql-database as well as in an excel spreadsheet: ausgef??llt Looks like a conflict between encodings, but my locals are set correct in my discretion and I tried something like: t.muster <- lapply(t.muster, iconv, "ISO8859-1", "ISO8859-15") but it did not work. my locals are: > Sys.getlocale() [1] "LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;LC_MONETARY=German_Switzerland.1252; LC_NUMERIC=C;LC_TIME=German_Switzerland.1252" and I am running R on: > R.version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 3.1 year 2006 month 06 day 01 svn rev 38247 language R version.string Version 2.3.1 (2006-06-01) I'd be glad if someone could help me out. Thanks in advance. Christian
On Fri, 6 Oct 2006, Christian Bieli wrote:> Hi all > > I have to generate some test data for import in an sql database. The > database is meant for web-based data entry in a study taking place in a > german speaking region, so factor levels of the variables include umlauts. > The variables in the dataframe t.muster are generated e.g. like this: > > t.muster$screening <- rep("ausgef?llt",50) > > and exported to a .csv file by: > > write.table(t.muster,"MakeMuster041006/MusterDaten.csv", > col.names=FALSE,row.names=FALSE,na="",sep=";") > > After export the factor level including an umlaut of t.muster$screening > look like this in the sql-database as well as in an excel spreadsheet: > > ausgef??lltI think the problem is rather how you imported them. That is the UTF-8 representation of the "ausgef?llt" viewed in a single-byte locale. R on Windows does not handle UTF-8, so something else has done the conversion. [...] -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thanks for your answer. I went round the problem by directly connect to the sql-database instead of generating a .csv file and then upload it. This works perfectly with the RODBC package and is much more suitable, too. Kind regards Christian Prof Brian Ripley schrieb:> On Fri, 6 Oct 2006, Christian Bieli wrote: > >> Hi all >> >> I have to generate some test data for import in an sql database. The >> database is meant for web-based data entry in a study taking place in a >> german speaking region, so factor levels of the variables include >> umlauts. >> The variables in the dataframe t.muster are generated e.g. like this: >> >> t.muster$screening <- rep("ausgef?llt",50) >> >> and exported to a .csv file by: >> >> write.table(t.muster,"MakeMuster041006/MusterDaten.csv", >> col.names=FALSE,row.names=FALSE,na="",sep=";") >> >> After export the factor level including an umlaut of t.muster$screening >> look like this in the sql-database as well as in an excel spreadsheet: >> >> ausgef??llt > > I think the problem is rather how you imported them. That is the > UTF-8 representation of the "ausgef?llt" viewed in a single-byte > locale. R on Windows does not handle UTF-8, so something else has > done the conversion. > > [...] >