Dear all, I have imported a dataset from Stata using the foreign package. The original data contain French characters such as è and ç . After importing, string variables containing names of French departments have changed. E.g. Ardèche became Ard\x8fche. I would like to ask how I could plot these changed strings, since now the strings with special characters fail to be printed in the plot (either using plot() or ggplot2()). I have googled for solutions, but actually find it hard to determine whether I should change my R setup or should read in the data in a different way. Since I work on a mac I changed my local according to the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and code and output on what works for me and what does not. Thank you in advance for you comments. Best, Richard #-------------- rm(list=ls()) sessionInfo() # R version 2.15.2 (2012-10-26) # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) # # locale: # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 # creating variables department <- c("Nord","Paris","Ard\x8fche") department2 <- c("Nord", "Paris", "Ardèche") n <- c(2,4,1) # creating dataframes df <- data.frame(department,n) df2 <- data.frame(department2,n) department # [1] "Nord" "Paris" "Ard\x8fche" department2 # [1] "Nord" "Paris" "Ardèche" plot(df) # fails to show the text "Ardèche" plot(df2) # shows text "Ardèche" # EOF [[alternative HTML version deleted]]
Le mardi 11 d?cembre 2012 ? 01:10 +0100, Richard Zijdeman a ?crit :> Dear all, > > I have imported a dataset from Stata using the foreign package. The > original data contain French characters such as and . > After importing, string variables containing names of French > departments have changed. E.g. Ardche became Ard\x8fche. I would like > to ask how I could plot these changed strings, since now the strings > with special characters fail to be printed in the plot (either using > plot() or ggplot2()). > > I have googled for solutions, but actually find it hard to determine > whether I should change my R setup or should read in the data in a > different way. Since I work on a mac I changed my local according to > the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and > code and output on what works for me and what does not. Thank you in > advance for you comments.Accentuated characters should work fine on a machine using a UTF-8 locale as yours. I think the problem is that the imported data uses ISO8859-15 or UTF-16, not UTF-8. I have no idea whether .dta files specify an encoding or not, but I think you can convert them in R by calling iconv(department, "ISO-8859-15", "UTF-8") or iconv(department, "UTF-16", "UTF-8")> Best, > > Richard > > #-------------- > rm(list=ls()) > sessionInfo() > # R version 2.15.2 (2012-10-26) > # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > # > # locale: > # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > # creating variables > department <- c("Nord","Paris","Ard\x8fche")\x8 does not correspond to "?" AFAIK. In ISO8859-1 and -15 and UTF-16, it's \xE8 ("\uE8" in R). In UTF-8, it's C3 A8, "\303\250" in R.> department2 <- c("Nord", "Paris", "Ardche") > n <- c(2,4,1) > > # creating dataframes > df <- data.frame(department,n) > df2 <- data.frame(department2,n) > > department > # [1] "Nord" "Paris" "Ard\x8fche" > department2 > # [1] "Nord" "Paris" "Ardche" > > plot(df) # fails to show the text "Ardche" > plot(df2) # shows text "Ardche" > > # EOF > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 12-12-10 7:10 PM, Richard Zijdeman wrote:> Dear all, > > I have imported a dataset from Stata using the foreign package. The original data contain French characters such as ? and ? . > After importing, string variables containing names of French departments have changed. E.g. Ard?che became Ard\x8fche. I would like to ask how I could plot these changed strings, since now the strings with special characters fail to be printed in the plot (either using plot() or ggplot2()).As Milan as said, it's an encoding problem. I don't know any encoding that represents a "?" character by \x8f, but if it does that consistently, you can fix it pretty easily: > x [1] "Ard\x8fche" > x <- sub("\x8f", "?", x, useBytes=TRUE) > x [1] "Ard?che" You'll have to read the results pretty carefully to make sure you catch all the corrections (and to make sure they are done correctly), but you should be able to fix things. Duncan Murdoch> > I have googled for solutions, but actually find it hard to determine whether I should change my R setup or should read in the data in a different way. Since I work on a mac I changed my local according to the R for Mac OS X FAQ, chapter 9. Below is some info on my setup and code and output on what works for me and what does not. Thank you in advance for you comments. > > Best, > > Richard > > #-------------- > rm(list=ls()) > sessionInfo() > # R version 2.15.2 (2012-10-26) > # Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > # > # locale: > # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > # creating variables > department <- c("Nord","Paris","Ard\x8fche") > department2 <- c("Nord", "Paris", "Ard?che") > n <- c(2,4,1) > > # creating dataframes > df <- data.frame(department,n) > df2 <- data.frame(department2,n) > > department > # [1] "Nord" "Paris" "Ard\x8fche" > department2 > # [1] "Nord" "Paris" "Ard?che" > > plot(df) # fails to show the text "Ard?che" > plot(df2) # shows text "Ard?che" > > # EOF > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >