Duncan Murdoch
2015-May-25 16:43 UTC
[Rd] Unicode display problem with data frames under Windows
On 25/05/2015 11:37 AM, Ista Zahn wrote:> AFAIK this is the way it works on Windows. It has been discussed in several > places, e.g. > http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r > , > http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r > (both of these came up when I googled the subject line of your email).Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate a serious amount of time to deal with it. Since most of the people who tend to do that generally use systems in UTF-8 locales where this isn't a problem, or don't use Windows, it is languishing. Duncan Murdoch> > Best, > Ista > On May 25, 2015 9:39 AM, "Richard Cotton" <richierocks at gmail.com> wrote: > > > Here's a data frame with some Unicode symbols (set intersection and union). > > > > d <- data.frame(x = "A \u222a B \u2229 C") > > > > Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I > > see > > > > d > > ## x > > ## 1 A <U+222A> B n C > > > > Printing the column itself works fine. > > > > d$x > > ## [1] A ? B ? C > > ## Levels: A ? B ? C > > > > The encoding is correctly UTF-8. > > > > Encoding(as.character(d$x)) > > ## [1] "UTF-8" > > > > Under Linux both forms of printing are fine for me. > > > > I'm not quite sure whether I've missed a setting or if this is a bug, so > > > > Am I doing something silly? > > Can anyone else reproduce this? > > > > -- > > Regards, > > Richie > > > > Learning R > > 4dpiecharts.com > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Duncan Murdoch
2015-May-25 16:45 UTC
[Rd] Unicode display problem with data frames under Windows
On 25/05/2015 12:43 PM, Duncan Murdoch wrote:> On 25/05/2015 11:37 AM, Ista Zahn wrote: > > AFAIK this is the way it works on Windows. It has been discussed in several > > places, e.g. > > http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r > > , > > http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r > > (both of these came up when I googled the subject line of your email). > > Yes, but it is a bug, just a hard one to fix. It needs someone to > dedicate a serious amount of time to deal with it. > > Since most of the people who tend to do that generally use systems in > UTF-8 locales where this isn't a problem, or don't use Windows, it is > languishing.Oops, I meant to write "or don't use non-ascii characters", the UTF-8 locales implies non-Windows. Duncan Murdoch
Peter Meissner
2015-May-25 19:12 UTC
[Rd] Unicode display problem with data frames under Windows
Am .05.2015, 18:43 Uhr, schrieb Duncan Murdoch <murdoch.duncan at gmail.com>:> On 25/05/2015 11:37 AM, Ista Zahn wrote: >> AFAIK this is the way it works on Windows. It has been discussed in >> several >> places, e.g. >> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r >> , >> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r >> (both of these came up when I googled the subject line of your email). > > Yes, but it is a bug, just a hard one to fix. It needs someone to > dedicate a serious amount of time to deal with it. > > Since most of the people who tend to do that generally use systems in > UTF-8 locales where this isn't a problem, or don't use Windows, it is > languishing. > > Duncan MurdochI understand that these problems are not easy to fix but ... I think that "most of the people who tend to do that generally use systems in UTF-8 locales" is a biased perception. Developers might tend to use Mac or Linux most often. For others Windows still is and probably will be the OS most often used. For most of them switching to something else is a major hurdle. What I often witness is that those non existent Windows users try to muddle through with numerous calls to Encoding() , iconv() and the like while at the same time never being sure if the strange behavior is due to their lack of understanding, Windows specifics or due to R. In the end they either succeed with their muddling or give up, - but do not change the system. So whoever might attempt the Hercules task will be praised by thousands ;-) Best, Peter>> >> Best, >> Ista >> On May 25, 2015 9:39 AM, "Richard Cotton" <richierocks at gmail.com> wrote: >> >> > Here's a data frame with some Unicode symbols (set intersection and >> union). >> > >> > d <- data.frame(x = "A \u222a B \u2229 C") >> > >> > Printing this data frame under R 3.2.0 patched (r68378) and Windows >> 7, I >> > see >> > >> > d >> > ## x >> > ## 1 A <U+222A> B n C >> > >> > Printing the column itself works fine. >> > >> > d$x >> > ## [1] A ? B ? C >> > ## Levels: A ? B ? C >> > >> > The encoding is correctly UTF-8. >> > >> > Encoding(as.character(d$x)) >> > ## [1] "UTF-8" >> > >> > Under Linux both forms of printing are fine for me. >> > >> > I'm not quite sure whether I've missed a setting or if this is a bug, >> so >> > >> > Am I doing something silly? >> > Can anyone else reproduce this? >> > >> > -- >> > Regards, >> > Richie >> > >> > Learning R >> > 4dpiecharts.com >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-devel at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-devel >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Duncan Murdoch
2015-May-25 19:35 UTC
[Rd] Unicode display problem with data frames under Windows
On 25/05/2015 3:12 PM, Peter Meissner wrote:> Am .05.2015, 18:43 Uhr, schrieb Duncan Murdoch <murdoch.duncan at gmail.com>: > > > On 25/05/2015 11:37 AM, Ista Zahn wrote: > >> AFAIK this is the way it works on Windows. It has been discussed in > >> several > >> places, e.g. > >> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r > >> , > >> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r > >> (both of these came up when I googled the subject line of your email). > > > > Yes, but it is a bug, just a hard one to fix. It needs someone to > > dedicate a serious amount of time to deal with it. > > > > Since most of the people who tend to do that generally use systems in > > UTF-8 locales where this isn't a problem, or don't use Windows, it is > > languishing. > > > > Duncan Murdoch > > > I understand that these problems are not easy to fix but ... > > I think that > "most of the people who tend to do that generally use systems in UTF-8 > locales" > is a biased perception. Developers might tend to use Mac or Linux most > often. For others Windows still is and probably will be the OS most often > used. For most of them switching to something else is a major hurdle. > > What I often witness is that those non existent Windows users try to > muddle through with numerous calls to Encoding() , iconv() and the like > while at the same time never being sure if the strange behavior is due to > their lack of understanding, Windows specifics or due to R. In the end > they either succeed with their muddling or give up, - but do not change > the system. > > So whoever might attempt the Hercules task will be praised by thousands ;-)I'm not sure we disagree. R is a volunteer project, and the things that get done are the things that someone volunteers to do. But in this particular case, the volunteer needs a lot of knowledge about R internals to make progress, and there just aren't that many people like that. They are all "developers". If you aren't one of those people, you need to motivate one of them to volunteer to take this on. I don't think a financial contribution would work, but people do return favours: so do something that makes one of the developers' lives a lot easier, and then point out how this particular bug is causing trouble for you, and maybe they'll choose to return the favour. Duncan Murdoch
Richard Cotton
2015-May-26 07:01 UTC
[Rd] Unicode display problem with data frames under Windows
On 25 May 2015 at 19:43, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:>> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r> Yes, but it is a bug, just a hard one to fix. It needs someone to dedicate > a serious amount of time to deal with it. > > Since most of the people who tend to do that generally use systems in UTF-8 > locales where this isn't a problem, or don't use Windows, it is languishing.Thanks for the link and the explanation of why the bug exists.>> On May 25, 2015 9:39 AM, "Richard Cotton" <richierocks at gmail.com> wrote: >> >> > Here's a data frame with some Unicode symbols (set intersection and >> > union). >> > >> > d <- data.frame(x = "A \u222a B \u2229 C") >> > >> > Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I >> > see >> > >> > d >> > ## x >> > ## 1 A <U+222A> B n CFor future readers searching for a solution to this, you can get correct printing by setting the CTYPE part of the locale to Chinese/Japanese/Korean. Sys.setlocale("LC_CTYPE", "Chinese") ## [1] "Chinese (Simplified)_People's Republic of China.936" d ## x ## 1 A ? B ? C -- Regards, Richie Learning R 4dpiecharts.com
Peter Meissner
2015-May-26 07:29 UTC
[Rd] Unicode display problem with data frames under Windows
Am .05.2015, 09:01 Uhr, schrieb Richard Cotton <richierocks at gmail.com>:> On 25 May 2015 at 19:43, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >>> http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r > >> Yes, but it is a bug, just a hard one to fix. It needs someone to >> dedicate >> a serious amount of time to deal with it. >> >> Since most of the people who tend to do that generally use systems in >> UTF-8 >> locales where this isn't a problem, or don't use Windows, it is >> languishing. > > Thanks for the link and the explanation of why the bug exists. > >>> On May 25, 2015 9:39 AM, "Richard Cotton" <richierocks at gmail.com> >>> wrote: >>> >>> > Here's a data frame with some Unicode symbols (set intersection and >>> > union). >>> > >>> > d <- data.frame(x = "A \u222a B \u2229 C") >>> > >>> > Printing this data frame under R 3.2.0 patched (r68378) and Windows >>> 7, I >>> > see >>> > >>> > d >>> > ## x >>> > ## 1 A <U+222A> B n C > > For future readers searching for a solution to this, you can get > correct printing by setting the CTYPE part of the locale to > Chinese/Japanese/Korean. > > Sys.setlocale("LC_CTYPE", "Chinese") > ## [1] "Chinese (Simplified)_People's Republic of China.936" > > d > ## x > ## 1 A ? B ? C >There is another workaround. The problem with the character transformation on printing data frames stems from format() used within print.default(). Defining your own class and print function that does not use format() allows for correct printing in all locales. Like this: d <- data.frame(x = "A \u222a B \u2229 C") d ## x ## 1 A <U+222A> B n C class(d) <- c("unicode_df","data.frame") # this is print.default from base R with only two lines modified, see #old# print.unicode_df <- function (x, ..., digits = NULL, quote = FALSE, right = TRUE, row.names = TRUE) { n <- length(row.names(x)) if (length(x) == 0L) { cat(sprintf(ngettext(n, "data frame with 0 columns and %d row", "data frame with 0 columns and %d rows", domain = "R-base"), n), "\n", sep = "") } else if (n == 0L) { print.default(names(x), quote = FALSE) cat(gettext("<0 rows> (or 0-length row.names)\n")) } else { #old# m <- as.matrix(format.data.frame(x, digits = digits, #old# na.encode = FALSE)) m <- as.matrix(x) if (!isTRUE(row.names)) dimnames(m)[[1L]] <- if (identical(row.names, FALSE)) rep.int("", n) else row.names print(m, ..., quote = quote, right = right) } invisible(x) } d ## x ## [1,] A ? B ? C -- Erstellt mit Operas E-Mail-Modul: http://www.opera.com/mail/
Seemingly Similar Threads
- Unicode display problem with data frames under Windows
- Unicode display problem with data frames under Windows
- Unicode display problem with data frames under Windows
- Unicode display problem with data frames under Windows
- \U with more than 4 digits returns the wrong character