Duncan Murdoch
2018-Aug-12 16:33 UTC
[R] source script file that contains Unicode non-English characters
On 12/08/2018 11:48 AM, Faridedin Cheraghi wrote:> that's right and I don't want to change my locale. my sessionInfo() :I think it could be another manifestation of a known bug on Windows, where strings are converted from UTF-8 to the current locale and back to UTF-8, a lossy conversion. This has been present for many years, and requires a lot of internal changes to fix, so I wouldn't hold your breath waiting for a fix. I believe the "right" fix is for R to always convert strings to UTF-8 internally. This wasn't possible when the internationalization code was added many years ago because not all platforms supported UTF-8. It would be a lot of work now, and since it isn't needed now on the platforms most developers use, it's not receiving a lot of attention. Your workaround file(script, encoding = "UTF-8") %T>% source() %>% close() # works fine is a nice way to avoid this problem. Duncan Murdoch> > R version 3.5.1 (2018-07-02) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows >= 8 x64 (build 9200) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats? ? ?graphics? grDevices utils? ? ?datasets? methods? ?base > > thanks > > On Sun, Aug 12, 2018 at 8:00 PM, Duncan Murdoch > <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: > > On 12/08/2018 3:09 AM, Faridedin Cheraghi wrote: > > It was actually a .rmd file so you can get the coloring of the > bug report > in your text editor. I changed the format to .txt. > > > When I run your script on a Mac (in a UTF-8 locale), all lines work > as expected.? I'm guessing you are working on Windows, in a > non-UTF-8 locale? > > Posting sessionInfo() would be helpful. > > Duncan Murdoch > > > > -Farid > > On Sun, Aug 12, 2018 at 7:24 AM, Jeff Newmiller > <jdnewmil at dcn.davis.ca.us <mailto:jdnewmil at dcn.davis.ca.us>> > wrote: > > ... and read the Posting Guide... only a few file types will > ever make it > through the mailing list so repeatedly sending files not > among those few > types would just be frustrating for everyone. > > On August 11, 2018 4:51:43 PM PDT, Jim Lemon > <drjimlemon at gmail.com <mailto:drjimlemon at gmail.com>> wrote: > > Hi Farid, > Whatever you attached has not gotten through. > > Jim > > On Sat, Aug 11, 2018 at 6:47 PM, Farid Ch > <faridcher at gmail.com <mailto:faridcher at gmail.com>> wrote: > > Hi all, > > Please check the attached file. > > Thanks > Farid > > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> > mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > > and provide commented, minimal, self-contained, > reproducible code. > > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> > mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, > reproducible code. > > > -- > Sent from my phone. Please excuse my brevity. > > > > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing > list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible > code. > > >
Faridedin Cheraghi
2018-Aug-17 14:07 UTC
[R] source script file that contains Unicode non-English characters
Dear Duncan, thanks for your feedback on this. Even though most developers are not in Windows (which I doubt it), there are a huge number of people who use R on Windows and I am one of them who seriously work with R. Following my own workaround to this bug, now I hit another issue with another workaround when trying to render the Farsi Unicode characters. While these workarounds work in ad hoc, they are not appealing in all scenarios;I hit other problems related to this bug, e.g., when documenting a package with Roxygen2 package. Please see the attached files (r scripts) for the complete bug report. thanks Farid On Sun, Aug 12, 2018 at 9:03 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 12/08/2018 11:48 AM, Faridedin Cheraghi wrote: > >> that's right and I don't want to change my locale. my sessionInfo() : >> > > I think it could be another manifestation of a known bug on Windows, where > strings are converted from UTF-8 to the current locale and back to UTF-8, a > lossy conversion. This has been present for many years, and requires a lot > of internal changes to fix, so I wouldn't hold your breath waiting for a > fix. > > I believe the "right" fix is for R to always convert strings to UTF-8 > internally. This wasn't possible when the internationalization code was > added many years ago because not all platforms supported UTF-8. It would > be a lot of work now, and since it isn't needed now on the platforms most > developers use, it's not receiving a lot of attention. > > Your workaround > > file(script, > encoding = "UTF-8") %T>% > source() %>% > close() # works fine > > is a nice way to avoid this problem. > > Duncan Murdoch > > >> R version 3.5.1 (2018-07-02) >> Platform: x86_64-w64-mingw32/x64 (64-bit) >> Running under: Windows >= 8 x64 (build 9200) >> >> Matrix products: default >> >> locale: >> [1] LC_COLLATE=English_United States.1252 >> [2] LC_CTYPE=English_United States.1252 >> [3] LC_MONETARY=English_United States.1252 >> [4] LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> thanks >> >> On Sun, Aug 12, 2018 at 8:00 PM, Duncan Murdoch <murdoch.duncan at gmail.com >> <mailto:murdoch.duncan at gmail.com>> wrote: >> >> On 12/08/2018 3:09 AM, Faridedin Cheraghi wrote: >> >> It was actually a .rmd file so you can get the coloring of the >> bug report >> in your text editor. I changed the format to .txt. >> >> >> When I run your script on a Mac (in a UTF-8 locale), all lines work >> as expected. I'm guessing you are working on Windows, in a >> non-UTF-8 locale? >> >> Posting sessionInfo() would be helpful. >> >> Duncan Murdoch >> >> >> >> -Farid >> >> On Sun, Aug 12, 2018 at 7:24 AM, Jeff Newmiller >> <jdnewmil at dcn.davis.ca.us <mailto:jdnewmil at dcn.davis.ca.us>> >> wrote: >> >> ... and read the Posting Guide... only a few file types will >> ever make it >> through the mailing list so repeatedly sending files not >> among those few >> types would just be frustrating for everyone. >> >> On August 11, 2018 4:51:43 PM PDT, Jim Lemon >> <drjimlemon at gmail.com <mailto:drjimlemon at gmail.com>> wrote: >> >> Hi Farid, >> Whatever you attached has not gotten through. >> >> Jim >> >> On Sat, Aug 11, 2018 at 6:47 PM, Farid Ch >> <faridcher at gmail.com <mailto:faridcher at gmail.com>> wrote: >> >> Hi all, >> >> Please check the attached file. >> >> Thanks >> Farid >> >> >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> >> mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> >> mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> -- >> Sent from my phone. Please excuse my brevity. >> >> >> >> ______________________________________________ >> R-help at r-project.org <mailto:R-help at r-project.org> mailing >> list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible >> code. >> >> >> >> >-------------- next part -------------- A non-text attachment was scrubbed... Name: bug01_right.png Type: image/png Size: 3913 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180817/0f754c29/attachment-0004.png> -------------- next part -------------- A non-text attachment was scrubbed... Name: bug01_wrong.png Type: image/png Size: 6462 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20180817/0f754c29/attachment-0005.png>
Duncan Murdoch
2018-Aug-17 20:57 UTC
[R] source script file that contains Unicode non-English characters
On 17/08/2018 10:07 AM, Faridedin Cheraghi wrote:> Dear Duncan, > > thanks for your feedback on this. Even though most developers are not in > Windows (which I doubt it),I'm talking about the R Core developers. I used to be one, but have retired from that role. there are a huge number of people who use R> on Windows and I am one of them who seriously work with R.Indeed, Microsoft promotes R, and they have a lot of developers; they just don't contribute much to R. Honestly I'd suggest that if you are serious about working with languages not supported in the default code page, you should switch platforms.> Following my > own workaround to this bug, now?I hit another issue with another > workaround when trying to render the Farsi Unicode characters. While > these workarounds work in ad hoc, they are not appealing in all > scenarios;I hit other problems related to this bug, e.g., when > documenting a package with Roxygen2 package. > > Please see the attached files (r scripts) for the complete bug report.If you think this is a new bug, you should report it to the bug tracking system (which requires you to be registered first). Posting it to me or to R-help will probably not result in any action on it. Posting it to the bug page will at least result in a fairly permanent record. Duncan Murdoch> > thanks > Farid > > On Sun, Aug 12, 2018 at 9:03 PM, Duncan Murdoch > <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: > > On 12/08/2018 11:48 AM, Faridedin Cheraghi wrote: > > that's right and I don't want to change my locale. my > sessionInfo() : > > > I think it could be another manifestation of a known bug on Windows, > where strings are converted from UTF-8 to the current locale and > back to UTF-8, a lossy conversion.? This has been present for many > years, and requires a lot of internal changes to fix, so I wouldn't > hold your breath waiting for a fix. > > I believe the "right" fix is for R to always convert strings to > UTF-8 internally.? This wasn't possible when the > internationalization code was added many years ago because not all > platforms supported UTF-8.? It would be a lot of work now, and since > it isn't needed now on the platforms most developers use, it's not > receiving a lot of attention. > > Your workaround > > file(script, > ? ? ?encoding = "UTF-8") %T>% > ? ? ?source() %>% > ? ? ?close()? ?# works fine > > is a nice way to avoid this problem. > > Duncan Murdoch > > > R version 3.5.1 (2018-07-02) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows >= 8 x64 (build 9200) > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats? ? ?graphics? grDevices utils? ? ?datasets? methods? ?base > > thanks > > On Sun, Aug 12, 2018 at 8:00 PM, Duncan Murdoch > <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com> > <mailto:murdoch.duncan at gmail.com > <mailto:murdoch.duncan at gmail.com>>> wrote: > > ? ? On 12/08/2018 3:09 AM, Faridedin Cheraghi wrote: > > ? ? ? ? It was actually a .rmd file so you can get the coloring > of the > ? ? ? ? bug report > ? ? ? ? in your text editor. I changed the format to .txt. > > > ? ? When I run your script on a Mac (in a UTF-8 locale), all > lines work > ? ? as expected.? I'm guessing you are working on Windows, in a > ? ? non-UTF-8 locale? > > ? ? Posting sessionInfo() would be helpful. > > ? ? Duncan Murdoch > > > > ? ? ? ? -Farid > > ? ? ? ? On Sun, Aug 12, 2018 at 7:24 AM, Jeff Newmiller > ? ? ? ? <jdnewmil at dcn.davis.ca.us > <mailto:jdnewmil at dcn.davis.ca.us> > <mailto:jdnewmil at dcn.davis.ca.us <mailto:jdnewmil at dcn.davis.ca.us>>> > ? ? ? ? wrote: > > ? ? ? ? ? ? ... and read the Posting Guide... only a few file > types will > ? ? ? ? ? ? ever make it > ? ? ? ? ? ? through the mailing list so repeatedly sending > files not > ? ? ? ? ? ? among those few > ? ? ? ? ? ? types would just be frustrating for everyone. > > ? ? ? ? ? ? On August 11, 2018 4:51:43 PM PDT, Jim Lemon > ? ? ? ? ? ? <drjimlemon at gmail.com <mailto:drjimlemon at gmail.com> > <mailto:drjimlemon at gmail.com <mailto:drjimlemon at gmail.com>>> wrote: > > ? ? ? ? ? ? ? ? Hi Farid, > ? ? ? ? ? ? ? ? Whatever you attached has not gotten through. > > ? ? ? ? ? ? ? ? Jim > > ? ? ? ? ? ? ? ? On Sat, Aug 11, 2018 at 6:47 PM, Farid Ch > ? ? ? ? ? ? ? ? <faridcher at gmail.com > <mailto:faridcher at gmail.com> <mailto:faridcher at gmail.com > <mailto:faridcher at gmail.com>>> wrote: > > ? ? ? ? ? ? ? ? ? ? Hi all, > > ? ? ? ? ? ? ? ? ? ? Please check the attached file. > > ? ? ? ? ? ? ? ? ? ? Thanks > ? ? ? ? ? ? ? ? ? ? Farid > > > ? ? ? ? ? ? ? ? ? ? ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> > <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> > ? ? ? ? ? ? ? ? ? ? mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > > <https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help>> > ? ? ? ? ? ? ? ? ? ? PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > ? ? ? ? ? ? ? ? <http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html>> > > ? ? ? ? ? ? ? ? ? ? and provide commented, minimal, self-contained, > ? ? ? ? ? ? ? ? ? ? reproducible code. > > > ? ? ? ? ? ? ? ? ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> > <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> > ? ? ? ? ? ? ? ? mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > ? ? ? ? ? ? ? ? <https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help>> > ? ? ? ? ? ? ? ? PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > ? ? ? ? ? ? ? ? <http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html>> > ? ? ? ? ? ? ? ? and provide commented, minimal, self-contained, > ? ? ? ? ? ? ? ? reproducible code. > > > ? ? ? ? ? ? -- > ? ? ? ? ? ? Sent from my phone. Please excuse my brevity. > > > > ? ? ? ? ? ? ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> > <mailto:R-help at r-project.org <mailto:R-help at r-project.org>> mailing > ? ? ? ? ? ? list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > ? ? ? ? ? ? <https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help>> > ? ? ? ? ? ? PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html> > ? ? ? ? ? ? <http://www.R-project.org/posting-guide.html > <http://www.R-project.org/posting-guide.html>> > ? ? ? ? ? ? and provide commented, minimal, self-contained, > reproducible > ? ? ? ? ? ? code. > > > > >