In R 3.5.0 using the `encoding' argument of source() prevents loading files from the internet; without the `encoding' argument files can be loaded from the internet, but if they contain non-ascii characters, these are not correctly displayed under MS-Windows (but they are correctly displayed under GNU/Linux). With R 3.4.{2,3,4} there is no such problem: using `encoding' the files are loaded and non-ascii characters are correctly displayed under MS-Windows (but not without `encoding'). Here is a transcript from R 3.5.0 under GNU/Linux (the URLs are real, in case anyone wants to try and reproduce the problem):> ls()character(0)> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8") > ls()character(0)> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") > ls()character(0)> source("http://home.versanet.de/~s-berman/source1.R") > ls()[1] "source.test1"> source("http://home.versanet.de/~s-berman/source2.R") > ls()[1] "source.test1" "source.test2"> source.test1()[1] "This is a test."> source.test2()[1] "Non-ascii: ????" (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.) With 3.5.0 under MS-Windows, the transcript is the same except for the display of the last output, which is this: [1] "Non-ascii: ????????" (Here there are eight non-ascii characters, which display the Unicode decompositions of the four non-ascii characters above.) Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's the same except that the non-ascii characters are also correctly displayed even without the `encoding' argument):> ls()character(0)> source("http://home.versanet.de/~s-berman/source1.R") > ls()[1] "source.test1"> source("http://home.versanet.de/~s-berman/source2.R") > ls()[1] "source.test1" "source.test2"> source.test1()[1] "This is a test."> source.test2()[1] "Non-ascii: ????????"> rm(source.test2) > ls()[1] "source.test1"> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") > ls()[1] "source.test1" "source.test2"> source.test2()[1] "Non-ascii: ????" I did a web search but didn't find any reports of this issue, nor did I see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but maybe I've overlooked something. I'd be grateful for any enlightenment. Steve Berman
Looks like this actually comes from readLines(), nothing to do with source() as such: In current R-devel (still):> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") > readLines(f)character(0)> close(f) > f <- file("http://home.versanet.de/~s-berman/source2.R") > readLines(f)[1] "source.test2 <- function() {" " print(\"Non-ascii: ????\")" [3] "}" -pd> On 2 Jun 2018, at 15:37 , Stephen Berman <stephen.berman at gmx.net> wrote: > > In R 3.5.0 using the `encoding' argument of source() prevents loading > files from the internet; without the `encoding' argument files can be > loaded from the internet, but if they contain non-ascii characters, > these are not correctly displayed under MS-Windows (but they are > correctly displayed under GNU/Linux). With R 3.4.{2,3,4} there is no > such problem: using `encoding' the files are loaded and non-ascii > characters are correctly displayed under MS-Windows (but not without > `encoding'). Here is a transcript from R 3.5.0 under GNU/Linux (the > URLs are real, in case anyone wants to try and reproduce the problem): > >> ls() > character(0) >> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8") >> ls() > character(0) >> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") >> ls() > character(0) >> source("http://home.versanet.de/~s-berman/source1.R") >> ls() > [1] "source.test1" >> source("http://home.versanet.de/~s-berman/source2.R") >> ls() > [1] "source.test1" "source.test2" >> source.test1() > [1] "This is a test." >> source.test2() > [1] "Non-ascii: ????" > > (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.) > With 3.5.0 under MS-Windows, the transcript is the same except for the > display of the last output, which is this: > > [1] "Non-ascii: ????????" > > (Here there are eight non-ascii characters, which display the Unicode > decompositions of the four non-ascii characters above.) > > Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's > the same except that the non-ascii characters are also correctly > displayed even without the `encoding' argument): > >> ls() > character(0) >> source("http://home.versanet.de/~s-berman/source1.R") >> ls() > [1] "source.test1" >> source("http://home.versanet.de/~s-berman/source2.R") >> ls() > [1] "source.test1" "source.test2" >> source.test1() > [1] "This is a test." >> source.test2() > [1] "Non-ascii: ????????" >> rm(source.test2) >> ls() > [1] "source.test1" >> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") >> ls() > [1] "source.test1" "source.test2" >> source.test2() > [1] "Non-ascii: ????" > > I did a web search but didn't find any reports of this issue, nor did I > see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but > maybe I've overlooked something. I'd be grateful for any enlightenment. > > Steve Berman > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
>>>>> peter dalgaard >>>>> on Sun, 3 Jun 2018 23:51:24 +0200 writes:> Looks like this actually comes from readLines(), nothing > to do with source() as such: In current R-devel (still): >> f <- file("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") >> readLines(f) > character(0) >> close(f) >> f <- file("http://home.versanet.de/~s-berman/source2.R") >> readLines(f) > [1] "source.test2 <- function() {" " print(\"Non-ascii: ????\")" > [3] "}" > -pd and that's not even readLines(), but rather how exactly the connection is defined [even in your example above] > urlR <- "http://home.versanet.de/~s-berman/source2.R" > readLines(urlR, encoding="UTF-8") [1] "source.test2 <- function() {" " print(\"Non-ascii: ????\")" [3] "}" > f <- file(urlR, encoding = "UTF-8") > readLines(f) character(0) and the same behavior with scan() instead of readLines() :> scan(urlR,"") # worksRead 7 items [1] "source.test2" "<-" "function()" "{" [5] "print(\"Non-ascii:" "????\")" "}"> scan(f,"") # failsRead 0 items character(0)>So it seems as if the bug is in the file() [or url()] C code .. But then we also have to consider Windows .. where I think most changes have happened during the R-3.4.4 --> R-3.5.0 transition. >> On 2 Jun 2018, at 15:37 , Stephen Berman <stephen.berman at gmx.net> wrote: >> >> In R 3.5.0 using the `encoding' argument of source() prevents loading >> files from the internet; without the `encoding' argument files can be >> loaded from the internet, but if they contain non-ascii characters, >> these are not correctly displayed under MS-Windows (but they are >> correctly displayed under GNU/Linux). With R 3.4.{2,3,4} there is no >> such problem: using `encoding' the files are loaded and non-ascii >> characters are correctly displayed under MS-Windows (but not without >> `encoding'). Here is a transcript from R 3.5.0 under GNU/Linux (the >> URLs are real, in case anyone wants to try and reproduce the problem): >> >>> ls() >> character(0) >>> source("http://home.versanet.de/~s-berman/source1.R", encoding="UTF-8") >>> ls() >> character(0) >>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") >>> ls() >> character(0) >>> source("http://home.versanet.de/~s-berman/source1.R") >>> ls() >> [1] "source.test1" >>> source("http://home.versanet.de/~s-berman/source2.R") >>> ls() >> [1] "source.test1" "source.test2" >>> source.test1() >> [1] "This is a test." >>> source.test2() >> [1] "Non-ascii: ????" >> >> (The four non-ascii characters are Unicode 0xE4, 0xF6, 0xFC, 0xDF.) >> With 3.5.0 under MS-Windows, the transcript is the same except for the >> display of the last output, which is this: >> >> [1] "Non-ascii: ????????" >> >> (Here there are eight non-ascii characters, which display the Unicode >> decompositions of the four non-ascii characters above.) >> >> Here is a transcript from R 3.4.3 under MS-Windows (under GNU/Linux it's >> the same except that the non-ascii characters are also correctly >> displayed even without the `encoding' argument): >> >>> ls() >> character(0) >>> source("http://home.versanet.de/~s-berman/source1.R") >>> ls() >> [1] "source.test1" >>> source("http://home.versanet.de/~s-berman/source2.R") >>> ls() >> [1] "source.test1" "source.test2" >>> source.test1() >> [1] "This is a test." >>> source.test2() >> [1] "Non-ascii: ????????" >>> rm(source.test2) >>> ls() >> [1] "source.test1" >>> source("http://home.versanet.de/~s-berman/source2.R", encoding="UTF-8") >>> ls() >> [1] "source.test1" "source.test2" >>> source.test2() >> [1] "Non-ascii: ????" >> >> I did a web search but didn't find any reports of this issue, nor did I >> see any relevant entry in the 3.5.0 NEWS, so this looks like a bug, but >> maybe I've overlooked something. I'd be grateful for any enlightenment. >> >> Steve Berman >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel