Dear all, I want to send UTF-8 characters to the console. Font in the GUI-Preference 'Lucida Console', supporting the desired symbols: greater than or equal: UTF-8 2265, HTML-entity ≥ HTML-Unicode ≥ TeX \ge approximately equal: UTF-8 2248, HTML-entity ≈ HTML-Unicode ≈ TeX \approx txt <- "x ? y, x \u2265 y; a ? b, a \u2248 b" Encoding(txt) <- "UTF-8" print(txt) [1] "x = y, x = y; a \230 b, a \230 b" cat(txt, "\n") x = y, x = y; a ? b, a ? b Desired "x ? y, x ? y; a ? b, a ? b" I'm sending the email in UTF-8. Don?t know how @r-project.org is configured (ASCII?) If you see garbage, I'm sorry but you should get the idea. R 4.2.0 on Windows 7 (UCRT10.0.10240.16390) and Windows 11. Helmut -- Ing. Helmut Sch?tz BEBAC ? Consultancy Services for Bioequivalence and Bioavailability Studies Neubaugasse 36/11 1070 Vienna, Austria E helmut.schuetz at bebac.at <mailto:helmut.schuetz at bebac.at>
Wikipedia indicates that there are multiple flavors of UTF-8, but here is one solution. Wikipedia lists Unicode characters: https://en.wikipedia.org/wiki/List_of_Unicode_characters Way towards the bottom of the article is a table of math symbols. print("x\u2265y") print("a\u223cy") Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Helmut Sch?tz Sent: Thursday, June 23, 2022 6:26 AM To: r-help at r-project.org Subject: [R] UTF-8 to the console [External Email] Dear all, I want to send UTF-8 characters to the console. Font in the GUI-Preference 'Lucida Console', supporting the desired symbols: greater than or equal: UTF-8 2265, HTML-entity ≥ HTML-Unicode ≥ TeX \ge approximately equal: UTF-8 2248, HTML-entity ≈ HTML-Unicode ≈ TeX \approx txt <- "x ? y, x \u2265 y; a ? b, a \u2248 b" Encoding(txt) <- "UTF-8" print(txt) [1] "x = y, x = y; a \230 b, a \230 b" cat(txt, "\n") x = y, x = y; a ? b, a ? b Desired "x ? y, x ? y; a ? b, a ? b" I'm sending the email in UTF-8. Don?t know how @r-project.org is configured (ASCII?) If you see garbage, I'm sorry but you should get the idea. R 4.2.0 on Windows 7 (UCRT10.0.10240.16390) and Windows 11. Helmut -- Ing. Helmut Sch?tz BEBAC ? Consultancy Services for Bioequivalence and Bioavailability Studies Neubaugasse 36/11 1070 Vienna, Austria E helmut.schuetz at bebac.at <mailto:helmut.schuetz at bebac.at> ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=8xremj7fegxhsnLOZ-LH70y0JDttRWSw_iNumafCnpOKJtvwv9LZG42rTfrJSPJ4&s=WYrKq8LmaHVI5aCB3Je6H3CuNkvPGP4cRbbxmRTr6I0&ePLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwIFaQ&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=8xremj7fegxhsnLOZ-LH70y0JDttRWSw_iNumafCnpOKJtvwv9LZG42rTfrJSPJ4&s=5KDJgrBy7d0uYnfEWGMwwZX-jomihY43Kb9slt_Yhxg&eand provide commented, minimal, self-contained, reproducible code.
On Thu, 23 Jun 2022 12:26:23 +0200 Helmut Sch?tz <helmut.schuetz at bebac.at> wrote:> txt <- "x ? y, x \u2265 y; a ? b, a \u2248 b" > Encoding(txt) <- "UTF-8"There shouldn't be a need to change the encoding. If you're creating a Unicode literal, R should already choose UTF-8 for the resulting string. Either way, R automatically converts the strings from their source encoding on output. Moreover, `Encoding<-` doesn't perform any conversion, it only changes the declared encoding on the string, affecting the way it may be encoded or decoded in the future. If Encoding(txt) wasn't already UTF-8, you would likely be damaging the data: string <- '?' # is already UTF-8 # No conversion happens, the same bytes re-interpreted differently Encoding(string) <- 'latin1' string # [1] "??"> R 4.2.0 on Windows 7On Windows 7, Rterm will stay limited to the OEM encoding, since UCRT only supports UTF-8 locales on Windows ? 10, version 1903. If your OEM encoding doesn't have the ?, ? characters, printing them to the console is going to be hard. Not impossible -- e.g. an R extension written in C could obtain a handle to the current console and use Unicode-aware Windows API to print these characters -- but just getting it to work would be hard, and it will be likely unportable.> and Windows 11.I think it should be possible. What does system('chcp') say in your Rterm session? For console UTF-8 output to work, two things should happen: 1. The console must be using UTF-8, i.e. chcp must say it's using code page 65001. 2. Rterm must understand that and also use UTF-8 on output. What does sessionInfo() and l10n_info() say in your Rterm session on Windows 11? In Rterm source code, I see a check for GetACP() == 65001, which should have switched the console encoding to UTF-8 automatically. Perhaps you need to run chcp 65001 before starting Rterm? Maybe you need to set a checkbox [*] to make the ANSI codepage UTF-8 by default? I'm not sure any of this is going to work, but it's something to try before someone more knowledgeable with R on Windows can help you. -- Best regards, Ivan [*] https://superuser.com/a/1451686
Hi, from what I can tell, unicode-related issues (in several programming languages) are often specifically related to the MS Windows operating system rather than to R (though that does not imply it is irrelevant here). You may wish to have a look at: https://blog.r-project.org/2020/05/02/utf-8-support-on-windows/ This may provide directions to solving your issue. Yours. Olivier. On Thu, 23 Jun 2022 12:26:23 +0200 Helmut Sch?tz <helmut.schuetz at bebac.at> wrote:> Dear all, > > I want to send UTF-8 characters to the console. Font in the > GUI-Preference 'Lucida Console', supporting the desired symbols: > greater than or equal: UTF-8 2265, HTML-entity ≥ HTML-Unicode > ≥ TeX \ge > approximately equal: UTF-8 2248, HTML-entity ≈ HTML-Unicode > ≈ TeX \approx > > txt <- "x ? y, x \u2265 y; a ? b, a \u2248 b" > Encoding(txt) <- "UTF-8" > print(txt) > [1] "x = y, x = y; a \230 b, a \230 b" > cat(txt, "\n") > x = y, x = y; a ? b, a ? b > > Desired "x ? y, x ? y; a ? b, a ? b" > > I'm sending the email in UTF-8. Don?t know how @r-project.org is > configured (ASCII?) If you see garbage, I'm sorry but you should get > the idea. > > R 4.2.0 on Windows 7 (UCRT10.0.10240.16390) and Windows 11. > > Helmut > -- > Ing. Helmut Sch?tz > BEBAC ? Consultancy Services for > Bioequivalence and Bioavailability Studies > Neubaugasse 36/11 > 1070 Vienna, Austria > E helmut.schuetz at bebac.at <mailto:helmut.schuetz at bebac.at> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code.-- Olivier Crouzet, PhD http://olivier.ghostinthemachine.space /Ma?tre de Conf?rences/ @LLING - Laboratoire de Linguistique de Nantes UMR6310 CNRS / Universit? de Nantes
Dear Helmut, thanks for the report, this is actually a bug in Rterm (or Windows, hard to tell, but something that can be fixed in Rterm). More below On 6/23/22 12:26, Helmut Sch?tz wrote:> Dear all, > > I want to send UTF-8 characters to the console. Font in the > GUI-Preference 'Lucida Console', supporting the desired symbols: > greater than or equal: UTF-8 2265, HTML-entity ≥ HTML-Unicode > ≥ TeX \ge > approximately equal: UTF-8 2248, HTML-entity ≈ HTML-Unicode > ≈ TeX \approx > > txt <- "x ? y, x \u2265 y; a ? b, a \u2248 b" > Encoding(txt) <- "UTF-8" > print(txt) > [1] "x = y, x = y; a \230 b, a \230 b" > cat(txt, "\n") > x = y, x = y; a ? b, a ? b > > Desired "x ? y, x ? y; a ? b, a ? b" > > I'm sending the email in UTF-8. Don?t know how @r-project.org is > configured (ASCII?) If you see garbage, I'm sorry but you should get > the idea. > > R 4.2.0 on Windows 7 (UCRT10.0.10240.16390) and Windows 11.The underlying problem I can reproduce on my Windows 10 (which is almost surely what you are seeing on Windows 11) is that characters ? and ? cannot be pasted to RTerm when running in cmd.exe or PowerShell. Pasting these characters pastes nothing. I've fixed this now in R-devel 83094 (and R-patched 83095). I would be grateful if you (or anyone else) could test e.g. in R-patched, most likely this example will work as it did for me, but also other examples you can think of. Processing the input keys in Rterm/getline is very tricky and brittle. What the code sees depends on what the console implementation decides to do, and it differs for different console implementations, and sadly this is not documented as far as I could find. Now, the problem you reported does not happen in Msys2/mintty (so Rtools42) terminal, because the terminal uses a different console implementation. Also, the problem doesn't happen with the Windows Terminal application, which has a yet different implementation. If you ever needed a work-around to such problems, I would recommend trying the Windows Terminal application. The problem doesn't happen in Rgui, either, but that uses a different code path completely on R end, indeed it does not run Rterm. There is a key combination "Alt+I" you can press in RTerm, which will switch to debug mode and will display the keyboard codes R receives (it matches the sources in getline.c). When one sees different behavior of things like your reported problem in with different console implementations, it usually comes with different keyboard codes sent to R. Your report has been very useful, thanks, and sorry for the long delay. I would have spotted it earlier on R bugzilla (or R-devel) list. Best Tomas> > Helmut