>>>>> Viechtbauer, Wolfgang (SP)
>>>>> on Fri, 8 Jan 2021 13:50:14 +0000 writes:
> Instead of a separate file to store such a list, would it be an idea to
add versions of the \href{}{} and \url{} markup commands that are skipped by the
URL checks?
> Best,
> Wolfgang
I think John Nash and you misunderstood -- or then I
misunderstood -- the original proposal:
I've been understanding that there should be a "central
repository" of URL
exceptions that is maintained by volunteers.
And rather *not* that package authors should get ways to skip
URL checking..
Martin
>> -----Original Message-----
>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf
Of Spencer
>> Graves
>> Sent: Friday, 08 January, 2021 13:04
>> To: r-devel at r-project.org
>> Subject: Re: [Rd] URL checks
>>
>> I also would be pleased to be allowed to provide "a list of
known
>> false-positive/exceptions" to the URL tests. I've been
challenged
>> multiple times regarding URLs that worked fine when I checked them.
We
>> should not be required to do a partial lobotomy to pass R CMD check
;-)
>>
>> Spencer Graves
>>
>> On 2021-01-07 09:53, Hugo Gruson wrote:
>>>
>>> I encountered the same issue today with
https://astrostatistics.psu.edu/.
>>>
>>> This is a trust chain issue, as explained here:
>>> https://whatsmychaincert.com/?astrostatistics.psu.edu.
>>>
>>> I've worked for a couple of years on a project to increase
HTTPS
>>> adoption on the web and we noticed that this type of error is
very
>>> common, and that website maintainers are often unresponsive to
requests
>>> to fix this issue.
>>>
>>> Therefore, I totally agree with Kirill that a list of known
>>> false-positive/exceptions would be a great addition to save
time to both
>>> the CRAN team and package developers.
>>>
>>> Hugo
>>>
>>> On 07/01/2021 15:45, Kirill M?ller via R-devel wrote:
>>>> One other failure mode: SSL certificates trusted by
browsers that are
>>>> not installed on the check machine, e.g. the "GEANT
Vereniging"
>>>> certificate from https://relational.fit.cvut.cz/ .
>>>>
>>>> K
>>>>
>>>> On 07.01.21 12:14, Kirill M?ller via R-devel wrote:
>>>>> Hi
>>>>>
>>>>> The URL checks in R CMD check test all links in the
README and
>>>>> vignettes for broken or redirected links. In many cases
this improves
>>>>> documentation, I see problems with this approach which
I have
>>>>> detailed below.
>>>>>
>>>>> I'm writing to this mailing list because I think
the change needs to
>>>>> happen in R's check routines. I propose to
introduce an "allow-list"
>>>>> for URLs, to reduce the burden on both CRAN and package
maintainers.
>>>>>
>>>>> Comments are greatly appreciated.
>>>>>
>>>>> Best regards
>>>>>
>>>>> Kirill
>>>>>
>>>>> # Problems with the detection of broken/redirected URLs
>>>>>
>>>>> ## 301 should often be 307, how to change?
>>>>>
>>>>> Many web sites use a 301 redirection code that probably
should be a
>>>>> 307. For example, https://www.oracle.com and
https://www.oracle.com/
>>>>> both redirect to https://www.oracle.com/index.html with
a 301. I
>>>>> suspect the company still wants oracle.com to be
recognized as the
>>>>> primary entry point of their web presence (to reserve
the right to
>>>>> move the redirection to a different location later), I
haven't
>>>>> checked with their PR department though. If that's
true, the redirect
>>>>> probably should be a 307, which should be fixed by
their IT
>>>>> department which I haven't contacted yet either.
>>>>>
>>>>> $ curl -i https://www.oracle.com
>>>>> HTTP/2 301
>>>>> server: AkamaiGHost
>>>>> content-length: 0
>>>>> location: https://www.oracle.com/index.html
>>>>> ...
>>>>>
>>>>> ## User agent detection
>>>>>
>>>>> twitter.com responds with a 400 error for requests
without a user
>>>>> agent string hinting at an accepted browser.
>>>>>
>>>>> $ curl -i https://twitter.com/
>>>>> HTTP/2 400
>>>>> ...
>>>>> <body>...<p>Please switch to a supported
browser...</p>...</body>
>>>>>
>>>>> $ curl -s -i https://twitter.com/ -A "Mozilla/5.0
(X11; Ubuntu; Linux
>>>>> x86_64; rv:84.0) Gecko/20100101 Firefox/84.0" |
head -n 1
>>>>> HTTP/2 200
>>>>>
>>>>> # Impact
>>>>>
>>>>> While the latter problem *could* be fixed by supplying
a browser-like
>>>>> user agent string, the former problem is virtually
unfixable -- so
>>>>> many web sites should use 307 instead of 301 but
don't. The above
>>>>> list is also incomplete -- think of unreliable links,
HTTP links,
>>>>> other failure modes...
>>>>>
>>>>> This affects me as a package maintainer, I have the
choice to either
>>>>> change the links to incorrect versions, or remove them
altogether.
>>>>>
>>>>> I can also choose to explain each broken link to CRAN,
this subjects
>>>>> the team to undue burden I think. Submitting a package
with NOTEs
>>>>> delays the release for a package which I must release
very soon to
>>>>> avoid having it pulled from CRAN, I'd rather not
risk that -- hence I
>>>>> need to remove the link and put it back later.
>>>>>
>>>>> I'm aware of https://github.com/r-lib/urlchecker,
this alleviates the
>>>>> problem but ultimately doesn't solve it.
>>>>>
>>>>> # Proposed solution
>>>>>
>>>>> ## Allow-list
>>>>>
>>>>> A file inst/URL that lists all URLs where failures are
allowed --
>>>>> possibly with a list of the HTTP codes accepted for
that link.
>>>>>
>>>>> Example:
>>>>>
>>>>> https://oracle.com/ 301
>>>>> https://twitter.com/drob/status/1224851726068527106 400
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel