Bill Dunlap
2022-Apr-25 15:52 UTC
[R] about opening R script Chinese annotation garble problem
The answer depends on the encoding of the file containing the Chinese characters and on the version of R (since you are using Windows). I copied your subject line into Wordpad and and added some syntax to make a valid R expression s <- "?? via R-help" I then saved it with the type "Unicode Text Document". In my version of Wordpad this means UTF-16. The bytes in the file are 4.2.0> readBin("Chinese-utf-16.txt", what="raw", n=file.size("Chinese-utf-16.txt")) [1] ff fe 73 00 20 00 3c 00 2d 00 20 00 22 00 38 6c 1b 52 [19] 20 00 76 00 69 00 61 00 20 00 52 00 2d 00 68 00 65 00 [37] 6c 00 70 00 22 00 0d 00 0a 00 All the nulls in the file are a hint that this is encoded using UTF-16, not UTF-8. With R-4.2.0 (released a few days ago) I can source the file with 4.2.0> source("Chinese-utf-16.txt", encoding="UTF-16") 4.2.0> s [1] "?? via R-help" 4.2.0> Encoding(s) [1] "UTF-8" With R-4.1.2 I get > source("Chinese-utf-16.txt", encoding="UTF-16") Error in source("Chinese-utf-16.txt", encoding = "UTF-16") : Chinese-utf-16.txt:1:6: unexpected INCOMPLETE_STRING 1: s <- " ^ In addition: Warning message: In readLines(file, warn = FALSE) : invalid input found on input connection 'Chinese-utf-16.txt' > source(file("Chinese-utf-16.txt", encoding="UTF-16")) > s [1] "<U+6C38><U+521B> via R-help" > source(file("Chinese-utf-16.txt", encoding="UTF-16"), encoding="UTF-8") > s [1] "?? via R-help" > Encoding(s) [1] "UTF-8" > charToRaw(s) [1] e6 b0 b8 e5 88 9b 20 76 69 61 20 52 2d 68 65 6c 70 R-4.2.0 makes this much easier. -Bill On Mon, Apr 25, 2022 at 1:04 AM ?? via R-help <r-help at r-project.org> wrote:> Garbled characters appear in Chinese annotation when opening program > script using RGui (see attached picture). I use a variety of methods have > not been solved, I hope to help me solve this problem. Thank you. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Bill Dunlap
2022-Apr-25 17:52 UTC
[R] about opening R script Chinese annotation garble problem
If your file is encoded as UTF-8 (as most stuff on the internet is, there will be no null bytes in the file), then R-4.2.0 on a recent enough version of Windows can source() it without mentioning the encoding. -Bill On Mon, Apr 25, 2022 at 8:52 AM Bill Dunlap <williamwdunlap at gmail.com> wrote:> The answer depends on the encoding of the file containing the Chinese > characters and on the version of R (since you are using Windows). I copied > your subject line into Wordpad and and added some syntax to make a valid R > expression > s <- "?? via R-help" > I then saved it with the type "Unicode Text Document". In my version of > Wordpad this means UTF-16. The bytes in the file are > 4.2.0> readBin("Chinese-utf-16.txt", what="raw", > n=file.size("Chinese-utf-16.txt")) > [1] ff fe 73 00 20 00 3c 00 2d 00 20 00 22 00 38 6c 1b 52 > [19] 20 00 76 00 69 00 61 00 20 00 52 00 2d 00 68 00 65 00 > [37] 6c 00 70 00 22 00 0d 00 0a 00 > All the nulls in the file are a hint that this is encoded using UTF-16, > not UTF-8. > > With R-4.2.0 (released a few days ago) I can source the file with > 4.2.0> source("Chinese-utf-16.txt", encoding="UTF-16") > 4.2.0> s > [1] "?? via R-help" > 4.2.0> Encoding(s) > [1] "UTF-8" > > With R-4.1.2 I get > > source("Chinese-utf-16.txt", encoding="UTF-16") > Error in source("Chinese-utf-16.txt", encoding = "UTF-16") : > Chinese-utf-16.txt:1:6: unexpected INCOMPLETE_STRING > 1: s <- " > ^ > In addition: Warning message: > In readLines(file, warn = FALSE) : > invalid input found on input connection 'Chinese-utf-16.txt' > > source(file("Chinese-utf-16.txt", encoding="UTF-16")) > > s > [1] "<U+6C38><U+521B> via R-help" > > source(file("Chinese-utf-16.txt", encoding="UTF-16"), encoding="UTF-8") > > s > [1] "?? via R-help" > > Encoding(s) > [1] "UTF-8" > > charToRaw(s) > [1] e6 b0 b8 e5 88 9b 20 76 69 61 20 52 2d 68 65 6c 70 > > R-4.2.0 makes this much easier. > > -Bill > > On Mon, Apr 25, 2022 at 1:04 AM ?? via R-help <r-help at r-project.org> > wrote: > >> Garbled characters appear in Chinese annotation when opening program >> script using RGui (see attached picture). I use a variety of methods have >> not been solved, I hope to help me solve this problem. Thank you. >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Tomas Kalibera
2022-May-24 21:42 UTC
[R] about opening R script Chinese annotation garble problem
On 4/25/22 19:52, Bill Dunlap wrote:> If your file is encoded as UTF-8 (as most stuff on the internet is, there > will be no null bytes in the file), then R-4.2.0 on a recent enough version > of Windows can source() it without mentioning the encoding.And the same applies to scripts used by Rgui via "Open script". For R 4.2.0 on recent Windows, the script file must be encoded as UTF-8. I've tested it on Bill's example expression and it works on my system. However, please note that running a complete line of the script using Ctrl-R doesn't work due to a bug which has been already fixed, but the fix will appear in R 4.2.1. In R 4.2.0, you need to select the text before pressing Ctrl-R (or copy using Ctrl-C/Ctrl-V) if it contains non-ASCII characters. Tomas> > -Bill > > On Mon, Apr 25, 2022 at 8:52 AM Bill Dunlap <williamwdunlap at gmail.com> > wrote: > >> The answer depends on the encoding of the file containing the Chinese >> characters and on the version of R (since you are using Windows). I copied >> your subject line into Wordpad and and added some syntax to make a valid R >> expression >> s <- "?? via R-help" >> I then saved it with the type "Unicode Text Document". In my version of >> Wordpad this means UTF-16. The bytes in the file are >> 4.2.0> readBin("Chinese-utf-16.txt", what="raw", >> n=file.size("Chinese-utf-16.txt")) >> [1] ff fe 73 00 20 00 3c 00 2d 00 20 00 22 00 38 6c 1b 52 >> [19] 20 00 76 00 69 00 61 00 20 00 52 00 2d 00 68 00 65 00 >> [37] 6c 00 70 00 22 00 0d 00 0a 00 >> All the nulls in the file are a hint that this is encoded using UTF-16, >> not UTF-8. >> >> With R-4.2.0 (released a few days ago) I can source the file with >> 4.2.0> source("Chinese-utf-16.txt", encoding="UTF-16") >> 4.2.0> s >> [1] "?? via R-help" >> 4.2.0> Encoding(s) >> [1] "UTF-8" >> >> With R-4.1.2 I get >> > source("Chinese-utf-16.txt", encoding="UTF-16") >> Error in source("Chinese-utf-16.txt", encoding = "UTF-16") : >> Chinese-utf-16.txt:1:6: unexpected INCOMPLETE_STRING >> 1: s <- " >> ^ >> In addition: Warning message: >> In readLines(file, warn = FALSE) : >> invalid input found on input connection 'Chinese-utf-16.txt' >> > source(file("Chinese-utf-16.txt", encoding="UTF-16")) >> > s >> [1] "<U+6C38><U+521B> via R-help" >> > source(file("Chinese-utf-16.txt", encoding="UTF-16"), encoding="UTF-8") >> > s >> [1] "?? via R-help" >> > Encoding(s) >> [1] "UTF-8" >> > charToRaw(s) >> [1] e6 b0 b8 e5 88 9b 20 76 69 61 20 52 2d 68 65 6c 70 >> >> R-4.2.0 makes this much easier. >> >> -Bill >> >> On Mon, Apr 25, 2022 at 1:04 AM ?? via R-help <r-help at r-project.org> >> wrote: >> >>> Garbled characters appear in Chinese annotation when opening program >>> script using RGui (see attached picture). I use a variety of methods have >>> not been solved, I hope to help me solve this problem. Thank you. >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.