thr3ads.net - R devel - [Rd] dowload.file(method="libcurl") and GET vs. HEAD requests [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Winston Chang

2016-Jun-22 01:35 UTC

[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

In R 3.2.4, if you ran download.file(method="libcurl"), it issues a
HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP
HEAD request first, and then a GET requet. This can result in problems
when the web server gives an error for a HEAD request, even if the
file is available with a GET request.

Is it possible to tell download.file to simply send a GET request,
without first sending a HEAD request?

In theory, web servers should give the same response for HEAD and GET
requests, except that for a HEAD request, it sends only headers, and
not the content. However, not all web servers do this for all files.
I've seen this problem come up in two different places.

The first is from an issue that someone filed for the downloader
package. The following works in R 3.2.4, but in R 3.3.0, it fails with
a 404 (tested on a Mac):
options(internet.info=1) # Show verbose download info
url <-
"https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011_gen.zip"
download.file(url, destfile = "out.zip", method="libcurl")

In R 3.3.0, the download succeeds with method="wget", and
method="curl". It's only method="libcurl" that has
problems.

The second place I've encountered a problem is in downloading attached
files from a GitHub release.
options(internet.info=1) # Show verbose download info
url <-
"https://github.com/wch/webshot/releases/download/v0.3/phantomjs-2.1.1-macosx.zip"
download.file(url, destfile = "out.zip")

This one fails with a 403 Forbidden because it gets redirected to a
URL in Amazon S3, where a signature of the file is embedded in the
URL. However, the signature is computed with the request type (HEAD
vs. GET), and so the same URL doesn't work for both. (See
http://stackoverflow.com/a/20580036/412655)

Any help would be appreciated!
-Winston

Martin Morgan

2016-Jun-22 02:45 UTC

head link

[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

On 06/21/2016 09:35 PM, Winston Chang wrote:> In R 3.2.4, if you ran download.file(method="libcurl"), it issues
a
> HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP
> HEAD request first, and then a GET requet. This can result in problems
> when the web server gives an error for a HEAD request, even if the
> file is available with a GET request.
>
> Is it possible to tell download.file to simply send a GET request,
> without first sending a HEAD request?
>
>
> In theory, web servers should give the same response for HEAD and GET
> requests, except that for a HEAD request, it sends only headers, and
> not the content. However, not all web servers do this for all files.
> I've seen this problem come up in two different places.
>
> The first is from an issue that someone filed for the downloader
> package. The following works in R 3.2.4, but in R 3.3.0, it fails with
> a 404 (tested on a Mac):
>    options(internet.info=1) # Show verbose download info
>    url <-
"https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011_gen.zip"
>   download.file(url, destfile = "out.zip",
method="libcurl")
>
> In R 3.3.0, the download succeeds with method="wget", and
> method="curl". It's only method="libcurl" that has
problems.
>
>
> The second place I've encountered a problem is in downloading attached
> files from a GitHub release.
>    options(internet.info=1) # Show verbose download info
>    url <-
"https://github.com/wch/webshot/releases/download/v0.3/phantomjs-2.1.1-macosx.zip"
>    download.file(url, destfile = "out.zip")
>
> This one fails with a 403 Forbidden because it gets redirected to a
> URL in Amazon S3, where a signature of the file is embedded in the
> URL. However, the signature is computed with the request type (HEAD
> vs. GET), and so the same URL doesn't work for both. (See
> http://stackoverflow.com/a/20580036/412655)
>
> Any help would be appreciated!
I think I introduced this, in

------------------------------------------------------------------------
r69280 | morgan | 2015-09-03 06:24:49 -0400 (Thu, 03 Sep 2015) | 4 lines

don't create empty file on 404 and similar errors

- download.file(method="libcurl")

------------------------------------------------------------------------

The idea was to test that the file can be downloaded before trying to 
download it; previously R would download the error page as though it 
were the content.

I'll give this some thought.

Martin Morgan

> -Winston
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

This email message may contain legally privileged and/or...{{dropped:2}}

Winston Chang

2016-Jun-22 16:01 UTC

head link

[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

Thanks for looking into it. Is there a way to avoid the HEAD request
in R 3.3.0? I'm asking because if there isn't, then I'll add a
workaround in a package I'm working on.

-Winston

On Tue, Jun 21, 2016 at 9:45 PM, Martin Morgan
<martin.morgan at roswellpark.org> wrote:> On 06/21/2016 09:35 PM, Winston Chang wrote:
>>
>> In R 3.2.4, if you ran download.file(method="libcurl"), it
issues a
>> HTTP GET request for the file. However, in R 3.3.0, it issues a HTTP
>> HEAD request first, and then a GET requet. This can result in problems
>> when the web server gives an error for a HEAD request, even if the
>> file is available with a GET request.
>>
>> Is it possible to tell download.file to simply send a GET request,
>> without first sending a HEAD request?
>>
>>
>> In theory, web servers should give the same response for HEAD and GET
>> requests, except that for a HEAD request, it sends only headers, and
>> not the content. However, not all web servers do this for all files.
>> I've seen this problem come up in two different places.
>>
>> The first is from an issue that someone filed for the downloader
>> package. The following works in R 3.2.4, but in R 3.3.0, it fails with
>> a 404 (tested on a Mac):
>>    options(internet.info=1) # Show verbose download info
>>    url <-
>>
"https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011_gen.zip"
>>   download.file(url, destfile = "out.zip",
method="libcurl")
>>
>> In R 3.3.0, the download succeeds with method="wget", and
>> method="curl". It's only method="libcurl" that
has problems.
>>
>>
>> The second place I've encountered a problem is in downloading
attached
>> files from a GitHub release.
>>    options(internet.info=1) # Show verbose download info
>>    url <-
>>
"https://github.com/wch/webshot/releases/download/v0.3/phantomjs-2.1.1-macosx.zip"
>>    download.file(url, destfile = "out.zip")
>>
>> This one fails with a 403 Forbidden because it gets redirected to a
>> URL in Amazon S3, where a signature of the file is embedded in the
>> URL. However, the signature is computed with the request type (HEAD
>> vs. GET), and so the same URL doesn't work for both. (See
>> http://stackoverflow.com/a/20580036/412655)
>>
>> Any help would be appreciated!
>
>
> I think I introduced this, in
>
> ------------------------------------------------------------------------
> r69280 | morgan | 2015-09-03 06:24:49 -0400 (Thu, 03 Sep 2015) | 4 lines
>
> don't create empty file on 404 and similar errors
>
> - download.file(method="libcurl")
>
> ------------------------------------------------------------------------
>
> The idea was to test that the file can be downloaded before trying to
> download it; previously R would download the error page as though it were
> the content.
>
> I'll give this some thought.
>
> Martin Morgan
>
>
>> -Winston
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
> This email message may contain legally privileged and/or confidential
> information.  If you are not the intended recipient(s), or the employee or
> agent responsible for the delivery of this message to the intended
> recipient(s), you are hereby notified that any disclosure, copying,
> distribution, or use of this email message is prohibited.  If you have
> received this message in error, please notify the sender immediately by
> e-mail and delete this email message from your computer. Thank you.

Reasonably Related Threads

Search for more reasonably related threads

R devel - Jun 2016 - dowload.file(method="libcurl") and GET vs. HEAD requests

[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

[Rd] dowload.file(method="libcurl") and GET vs. HEAD requests

Reasonably Related Threads