IAGO GINÉ VÁZQUEZ
2019-Sep-13 09:37 UTC
[Rd] Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
But if I type> "?"the output is [1] "?" so seemingly it can be represented. Or, am I wrong? Best Iago ________________________________ De: Tomas Kalibera <tomas.kalibera at gmail.com> Enviat el: divendres, 13 de setembre de 2019 11:24 Per a: IAGO GIN? V?ZQUEZ <i.gine at pssjd.org>; r-devel at r-project.org <r-devel at r-project.org> Tema: Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2 -windows 10 On 9/13/19 11:01 AM, IAGO GIN? V?ZQUEZ wrote:> I have a chinese character on a data frame, but the output of printing it is its UTF-8 code. Concretely, the character is ? and the code is U+6703. Following the code I arrive to the instruction > >> base::format.default("?") > which prints > > [1] "<U+6703>" > > I do not know which is the extent of this behaviour either if it follows on most recent versions of R. > > Is it expected?If you are running this on Windows in an encoding where the character cannot be represented (e.g. non-Chinese locale), then yes, this is expected behavior. On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the character will be formatted/displayed properly. Best Tomas> > Thank you! > > Iago > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel[[alternative HTML version deleted]]
Tomas Kalibera
2019-Sep-13 09:53 UTC
[Rd] Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
On 9/13/19 11:37 AM, IAGO GIN? V?ZQUEZ wrote:> But if I type > >"?" > the output is > [1] "?" > so seemingly it can be represented. Or, am I wrong?In RGui you can print the string, because RGui is a Windows Unicode application (uses UTF16-LE and bypasses the C runtime for strings). But it is just the gui, R itself (and hence also packages) use the current native encoding as defined by the C runtime. RGui will make sure R gets the string in UTF-8, but as soon as you do anything even slightly non-trivial, which includes formatting, the string will be converted to the current native encoding. Some R functions allow you to do certain things in UTF-8 without conversion to native encoding, you'd have to read very carefully the documentation for each function - but for practical use, you either need to live with the misinterpretation of some characters, or use Windows in the locale where your characters can be represented (e.g. Chinese locale when working with Chinese strings), or use Linux/maOS. On Linux/macOS the current native encoding can be UTF-8, so there is no problem. On Windows, with the current toolchain based on mingw, this is not possible. Best Tomas> > Best > Iago > ------------------------------------------------------------------------ > *De:* Tomas Kalibera <tomas.kalibera at gmail.com> > *Enviat el:* divendres, 13 de setembre de 2019 11:24 > *Per a:* IAGO GIN? V?ZQUEZ <i.gine at pssjd.org>; r-devel at r-project.org > <r-devel at r-project.org> > *Tema:* Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2 > -windows 10 > On 9/13/19 11:01 AM, IAGO GIN? V?ZQUEZ wrote: > > I have a chinese character on a data frame, but the output of > printing it is its UTF-8 code. Concretely, the character is ? and the > code is U+6703. Following the code I arrive to the instruction > > > >> base::format.default("?") > > which prints > > > > [1] "<U+6703>" > > > > I do not know which is the extent of this behaviour either if it > follows on most recent versions of R. > > > > Is it expected? > > If you are running this on Windows in an encoding where the character > cannot be represented (e.g. non-Chinese locale), then yes, this is > expected behavior. > > On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the > character will be formatted/displayed properly. > > Best > Tomas > > > > > Thank you! > > > > Iago > > > >??????? [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > >[[alternative HTML version deleted]]
Ray Donnelly
2019-Sep-13 11:33 UTC
[Rd] Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
On Fri, Sep 13, 2019 at 11:53 AM Tomas Kalibera <tomas.kalibera at gmail.com> wrote:> On 9/13/19 11:37 AM, IAGO GIN? V?ZQUEZ wrote: > > But if I type > > >"?" > > the output is > > [1] "?" > > so seemingly it can be represented. Or, am I wrong? > > In RGui you can print the string, because RGui is a Windows Unicode > application (uses UTF16-LE and bypasses the C runtime for strings). But > it is just the gui, R itself (and hence also packages) use the current > native encoding as defined by the C runtime. RGui will make sure R gets > the string in UTF-8, but as soon as you do anything even slightly > non-trivial, which includes formatting, the string will be converted to > the current native encoding. Some R functions allow you to do certain > things in UTF-8 without conversion to native encoding, you'd have to > read very carefully the documentation for each function - but for > practical use, you either need to live with the misinterpretation of > some characters, or use Windows in the locale where your characters can > be represented (e.g. Chinese locale when working with Chinese strings), > or use Linux/maOS. On Linux/macOS the current native encoding can be > UTF-8, so there is no problem. On Windows, with the current toolchain > based on mingw, this is not possible. >mingw-w64 is capable of processing utf-8 (it can process bytes after all). Can you explain what you mean here? Would any other compiler on Windows not suffer from this problem?> > > Best > Tomas > > > > > Best > > Iago > > ------------------------------------------------------------------------ > > *De:* Tomas Kalibera <tomas.kalibera at gmail.com> > > *Enviat el:* divendres, 13 de setembre de 2019 11:24 > > *Per a:* IAGO GIN? V?ZQUEZ <i.gine at pssjd.org>; r-devel at r-project.org > > <r-devel at r-project.org> > > *Tema:* Re: [Rd] Printing chinese characters (UTF-8) on R 3.5.2 > > -windows 10 > > On 9/13/19 11:01 AM, IAGO GIN? V?ZQUEZ wrote: > > > I have a chinese character on a data frame, but the output of > > printing it is its UTF-8 code. Concretely, the character is ? and the > > code is U+6703. Following the code I arrive to the instruction > > > > > >> base::format.default("?") > > > which prints > > > > > > [1] "<U+6703>" > > > > > > I do not know which is the extent of this behaviour either if it > > follows on most recent versions of R. > > > > > > Is it expected? > > > > If you are running this on Windows in an encoding where the character > > cannot be represented (e.g. non-Chinese locale), then yes, this is > > expected behavior. > > > > On Unix systems where R can run in UTF-8 encoding (Linux, macOS), the > > character will be formatted/displayed properly. > > > > Best > > Tomas > > > > > > > > Thank you! > > > > > > Iago > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-devel at r-project.org mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Apparently Analagous Threads
- Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
- Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
- Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
- Printing chinese characters (UTF-8) on R 3.5.2 -windows 10
- Avoiding Delete key function as 'Quit R' in Rterm when there are no characters in cursor line