Huidong Tian
2013-Mar-23  14:08 UTC
[R] Character change to Unicode format escape character when create a data frame
Hi, I want to create a data frame including a column containing some special characters, like "ø". when I print that data frame out, the content change to <U+00F8>, and when save the data frame to a txt file, the content keep in that style, but I need it in its original form, anybody can explain?> x <- data.frame(part = c("målløs", "ny")) > xpart 1 m<U+00E5>ll<U+00F8>s 2 ny x[1,1] [1] målløs Levels: m<U+00E5>ll<U+00F8>s ny Regards! -- Huidong Tian, Postdoc Swedish University of Agricultural Sciences [[alternative HTML version deleted]]
David Winsemius
2013-Mar-23  18:47 UTC
[R] Character change to Unicode format escape character when create a data frame
On Mar 23, 2013, at 7:08 AM, Huidong Tian wrote:> Hi, > I want to create a data frame including a column containing some special characters, like "?". when I print that data frame out, the content change to <U+00F8>, and when save the data frame to a txt file, the content keep in that style, but I need it in its original form, anybody can explain? > > >> x <- data.frame(part = c("mll?s", "ny")) >> x > part > 1 m<U+00E5>ll<U+00F8>s > 2 ny > > x[1,1] > [1] m?ll?s > Levels: m<U+00E5>ll<U+00F8>s ny >You have two problems. The trivial one is that by default data.frame stores character input as factors. The more fundamental difficulty is that you do not understand that display of characters is not the same as the internal representation. You already have achieved your desire and do not realize it. The number of characters in x[1,1] will be 6. Try it:> x <- data.frame(part = c("m?ll?s", "ny"), stringsAsFactors=FALSE) > xpart 1 m?ll?s 2 ny> nchar(x[1,1])[1] 6 Also try: cat(x) It's just that <U+00E5> is one way of representing a character that is not in the font table for the device you are working on. It is a single character internally in UTF-8 encoding. My Mac does keep the '?' and my sans font is Helvetica, but you may be on a machine with a different sans font.> quartzFonts('sans')$sans [1] "Helvetica" "Helvetica-Bold" "Helvetica-Oblique" [4] "Helvetica-BoldOblique" If you are on a different interactive device you should look at its help page to see the manner in which you change settings. For me that is ?quartz # but for you it might be ?windows If you want to see the printed translation to the input you can try cat() or you can print to a device that has a font with the proper glyph. The system setting can be adjusted with the various functions that specify the fonts in use with various devices: ?Devices ?options ?Encoding> > [[alternative HTML version deleted]]You should learn to post in plain-text. (Gmail does support that choice, but you need to make the effort.) This si a question that can be machine dependent and for any follow-up questions you need to include the output of sessionInfo as requested in the Posting Guide. -- David Winsemius Alameda, CA, USA
Possibly Parallel Threads
- special character encoding problem
- Sum of the deviance explained by each term in a gam model does not equal to the deviance explained by the full model.
- R-help Digest, Vol 80, Issue 30
- Plotting sigma symbol with unicode and turning into pdf
- Fonts in Quartz Devices