Dear List, I have data which contain the special German characters "ä", "ö", "ü" etc. After reading the text files into R those characters are displayed strangely, e. g. "ä" is "ä". The first step is to replace those with their typical transcription, e. g. "ä" becomes "ae" by using the gsub command. Until I upgraded to version 2.10.1 (from 2.8.0) this worked perfectly for all characters. Now it works for all characters but "Ü". temp1<-gsub("Ãoe","Ue",temp1) This letter is displayed as "Ãoe" (as before), but R is no longer able to find this character. The problem seems to be linked to the "oe" part, since I could substitute for "Ã" without a problem. Strangely if I get the two characters by extracting them with the substr command to a variable and then using the variable I am able to substitute without a problem. Any ideas, what I am missing? Thanks, Michael [[alternative HTML version deleted]]
Duncan Murdoch
2010-Apr-15 17:32 UTC
[R] Problem with ONE of the Special German Characters
On 15/04/2010 12:22 PM, Michael Stegh wrote:> Dear List, > > I have data which contain the special German characters "?", "?", "?" etc. After reading the > text files into R those characters are displayed strangely, e. g. "?" is "??". The first step is to > replace those with their typical transcription, e. g. "?" becomes "ae" by using the gsub > command. >Your example of "?" is what you would see if you stored it in UTF-8 encoding, then read it in Latin1. So I suspect you need to declare the encoding of the files you are reading before reading them. You can do this as follows: con <- file("foo.txt", encoding="UTF-8", open="r") readLines(con) close(con) By default, R assumes the encoding of files matches the default encoding on your system.> Until I upgraded to version 2.10.1 (from 2.8.0) this worked perfectly for all characters. Now it > works for all characters but "?". > > temp1<-gsub("?oe","Ue",temp1) >You might want to try perl=TRUE in the gsub() call; it seems to handle strange characters in regular expressions better than the default TRE library does. Duncan Murdoch> This letter is displayed as "?oe" (as before), but R is no longer able to find this character. The > problem seems to be linked to the "oe" part, since I could substitute for "?" without a problem. > Strangely if I get the two characters by extracting them with the substr command to a variable > and then using the variable I am able to substitute without a problem. Any ideas, what I am > missing? > > Thanks, > > Michael > > [[alternative HTML version deleted]] > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >