Wyatt, Kristin M
2012-Aug-09 04:53 UTC
[R] Using unicode symbol has unexpected results in levels of factor object
Dear all, When I use a unicode symbol in the labels for a factor object, the corresponding level does not display as expected. However, using levels() on the factor returns the desired output. I noticed the discrepancy when the legend labels from a call to ggplot() did not display the desired symbol, but an explicitly built legend using the same labels did. Example (I am trying to get the less than or equal to symbol):> .df <- data.frame(afp = c(0,0,1,1), time=c(0,2,0,1), surv=c(1, 0.5, 1, 0.4)) > afpLabels <- c("AFP \u2264 16", "AFP > 16") > afpStrata <- factor(.df$afp, labels=afpLabels) > afpStrata[1] AFP ? 16 AFP ? 16 AFP > 16 AFP > 16 Levels: AFP = 16 AFP > 16 The first level is reported as "AFP = 16".> levels(afpStrata)[1] "AFP ? 16" "AFP > 16">The desired result is produced with levels(). The code below shows this issue in context through calls to ggplot() if you don't mind loading all the libraries.> library(ggplot2) > library(gridExtra) > library(plyr) > > ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0) > > ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0) ++ scale_colour_hue(breaks=afpLabels, labels=afpLabels)>I am running a pre-compiled version of R on Windows 7 (64-bit).> sessionInfo()R version 2.15.1 (2012-06-22) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] plyr_1.7.1 gridExtra_0.9 ggplot2_0.9.1 loaded via a namespace (and not attached): [1] colorspace_1.1-1 dichromat_1.2-4 digest_0.5.2 labeling_0.1 [5] MASS_7.3-18 memoise_0.1 munsell_0.3 proto_0.3-9.2 [9] RColorBrewer_1.0-5 reshape2_1.2.1 scales_0.2.1 stringr_0.6.1 [13] tools_2.15.1>Sincerely, Kristin Berry [[alternative HTML version deleted]]
peter dalgaard
2012-Aug-09 09:02 UTC
[R] Using unicode symbol has unexpected results in levels of factor object
On Aug 9, 2012, at 06:53 , Wyatt, Kristin M wrote:> Dear all, > > When I use a unicode symbol in the labels for a factor object, the corresponding level does not display as expected. However, using levels() on the factor returns the desired output. I noticed the discrepancy when the legend labels from a call to ggplot() did not display the desired symbol, but an explicitly built legend using the same labels did. > > Example (I am trying to get the less than or equal to symbol): > >> .df <- data.frame(afp = c(0,0,1,1), time=c(0,2,0,1), surv=c(1, 0.5, 1, 0.4)) >> afpLabels <- c("AFP \u2264 16", "AFP > 16") >> afpStrata <- factor(.df$afp, labels=afpLabels) >> afpStrata > [1] AFP ? 16 AFP ? 16 AFP > 16 AFP > 16 > Levels: AFP = 16 AFP > 16 > > The first level is reported as "AFP = 16". > >> levels(afpStrata) > [1] "AFP ? 16" "AFP > 16" >> > > The desired result is produced with levels(). > > > The code below shows this issue in context through calls to ggplot() if you don't mind loading all the libraries. > >> library(ggplot2) >> library(gridExtra) >> library(plyr) >> >> ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0) >> >> ggplot(.df, aes(time, surv)) + geom_step(aes(color = afpStrata), size = 1.0) + > + scale_colour_hue(breaks=afpLabels, labels=afpLabels) >> > > I am running a pre-compiled version of R on Windows 7 (64-bit). >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-mingw32/x64 (64-bit)For whatever it is worth, this works fine (both examples) under OSX Snow Leopard. Looking at the code for print.factor, I would strongly suspect that the culprit is the line n <- length(lev <- encodeString(levels(x), quote = ifelse(quote, "\"", ""))) which figures since you are in a .1252 locale, not .utf8 (or UTF-8 or ...). Over to the Windows/locale/charset experts... -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Apparently Analagous Threads
- change colour of geom_step by scale_colour_manual
- Slow NFS writes
- Help With File transfer of files named "Apple*" + Lacie 5Big RAID
- ANNOUNCEMENT: New Project- Baltra (AFP/SMB services compatibility)
- Inconsistent behavior using 3.1.2 from macOS 10.12.2 to an AFP mount