Hans Ekbrand
2009-Apr-07 23:11 UTC
[R] Converting a whole dataframe (including attributes) from latin1 to UTF-8
Hi list! Short version: How do I convert a whole data.frame from latin1 encoding to utf8? I get SPSS files with latin1 encoding. My OS is GNU/Linux and the locale sv_SE.utf8, and I normally interface R with Emacs/ESS. I have used the following hack to convert a data.frame in latin1 to utf8:> Sys.setlocale(category = "LC_ALL", locale = "sv_SE.iso88591") > foo <- read.spss("foo.sav", to.data.frame=TRUE) > write.table(foo, "foo.data")$ recode lat1..utf8 foo.data> Sys.setlocale(category = "LC_ALL", locale = "sv_SE.utf8") > foo <- read.table("foo.data")I have now found two problems with this approach: a) variable.labels is droped b) the order of unordered factors is changed I had just worked out a hack for a) when I realised b). b) is a problem when the factors really is ordered, but not recognized as such by read.spss (and/or not defined as such in SPSS, but since SPSS respects the numeric values of the factors anyway, users don't need to) Rather than hack around b) too, I wonder if anyone on the list know how to convert a whole data.frame from latin1 encoding to utf8? TIA -- Hans Ekbrand (http://sociologi.cjb.net) <hans at sociologi.cjb.net> A. Because it breaks the logical sequence of discussion Q. Why is top posting bad? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: Digital signature URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090408/41538cee/attachment-0002.bin>