Thanks for the report, fixed in R-devel (74848).
Best
Tomas
On 06/04/2018 02:41 PM, NELSON, Michael wrote:>
> On R 3.5.0 (Mac)
>
> The issue appears when using the default (libcurl) method and specifying
the encoding
>
> Note that using method='internal' causes a segfault if used in
conjunction with encoding. (and works when encoding is not set)
>
> urlR <- "http://home.versanet.de/~s-berman/source2.R"
> # works
> url_default <- url(urlR)
> scan(url_default, "")
> # Read 7 items
> # [1] "source.test2" "<-"
"function()" "{"
"print(\"Non-ascii:" "????\")"
> # [7] "}"
>
> url_default_en <- url(urlR, encoding = "UTF-8")
> scan(url_default_en, "")
> # Read 0 items
> # character(0)
> url_internal <- url(urlR, method = 'internal')
> scan(url_internal, "")
> # Read 7 items
> # [1] "source.test2" "<-"
"function()" "{"
"print(\"Non-ascii:" "????\")"
> # [7] "}"
>
> url_internal_en <- url(urlR, encoding = "UTF-8", method =
'internal')
> #scan(url_internal_en, "")
> #*** caught segfault ***
> # address 0x0, cause 'memory not mapped'
>
> url_libcurl <- url(urlR, method = 'libcurl')
> scan(url_libcurl, "")
> # Read 7 items
> # [1] "source.test2" "<-"
"function()" "{"
"print(\"Non-ascii:" "????\")"
> # [7] "}"
> url_libcurl_en <- url(urlR, encoding = "UTF-8", method =
'libcurl')
> scan(url_libcurl_en, "")
> # Read 0 items
> # character(0)
>
>
> Michael
>
> ________________________________________
> From: R-devel [r-devel-bounces at r-project.org] on behalf of Stephen
Berman [stephen.berman at gmx.net]
> Sent: Monday, 4 June 2018 7:26 PM
> To: Martin Maechler
> Cc: R-devel
> Subject: Re: [Rd] encoding argument of source() in 3.5.0
>
> On Mon, 4 Jun 2018 10:44:11 +0200 Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
>
>>>>>>> peter dalgaard
>>>>>>> on Sun, 3 Jun 2018 23:51:24 +0200 writes:
>> > Looks like this actually comes from readLines(), nothing
>> > to do with source() as such: In current R-devel (still):
>>
>> >> f <-
file("http://home.versanet.de/~s-berman/source2.R",
encoding="UTF-8")
>> >> readLines(f)
>> > character(0)
>> >> close(f)
>> >> f <-
file("http://home.versanet.de/~s-berman/source2.R")
>> >> readLines(f)
>> > [1] "source.test2 <- function() {" "
print(\"Non-ascii: ????\")"
>> > [3] "}"
>>
>> > -pd
>>
>> and that's not even readLines(), but rather how exactly the
>> connection is defined [even in your example above]
>>
>> > urlR <-
"http://home.versanet.de/~s-berman/source2.R"
>> > readLines(urlR, encoding="UTF-8")
>> [1] "source.test2 <- function() {" "
print(\"Non-ascii: ????\")"
>> [3] "}"
>> > f <- file(urlR, encoding = "UTF-8")
>> > readLines(f)
>> character(0)
>>
>> and the same behavior with scan() instead of readLines() :
>>
>>> scan(urlR,"") # works
>> Read 7 items
>> [1] "source.test2" "<-"
"function()" "{"
>> [5] "print(\"Non-ascii:" "????\")"
"}"
>>> scan(f,"") # fails
>> Read 0 items
>> character(0)
>> So it seems as if the bug is in the file() [or url()] C code ..
> Yes, the problem seems to be restricted to loading files from a
> (non-local) URL; i.e. this works fine on my computer:
>
> > source("file:///home/steve/prog/R/source2.R",
encoding="UTF-8")
>
> Also, I noticed this works too:
>
> > read.table("http://home.versanet.de/~s-berman/table2",
encoding="UTF-8", skip=1)
>
> where (if I read the source correctly) using `skip=1' makes
read.table()
> call readLines(). (The read.table() invocation also works without
> `skip'.)
>
>> But then we also have to consider Windows .. where I think most changes
have
>> happened during the R-3.4.4 --> R-3.5.0 transition.
> Yes, please. I need (or at least it would be convenient) to be able to
> load R code containing non-ascii characters from the web under
> MS-Windows.
>
> Steve Berman
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
__________________________________________________________________________________________________________
> This email has been scanned for the NSW Ministry of Health by the Websense
Hosted Email Security System.
> Emails and attachments are monitored to ensure compliance with the NSW
Ministry of health's Electronic Messaging Policy.
>
__________________________________________________________________________________________________________
>
>
_______________________________________________________________________________________________________
> Disclaimer: This message is intended for the addressee named and may
contain confidential information.
> If you are not the intended recipient, please delete it and notify the
sender.
> Views expressed in this message are those of the individual sender, and are
not necessarily the views of the NSW Ministry of Health.
>
_______________________________________________________________________________________________________
> This email has been scanned for the NSW Ministry of Health by the Websense
Hosted Email Security System.
> Emails and attachments are monitored to ensure compliance with the NSW
Ministry of Health's Electronic Messaging Policy.
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel