Hugh Parsonage
2018-Mar-21 12:07 UTC
[Rd] Proposal to reduce check times by skipping GitHub pulls and issues URL checks
When a package is submitted to CRAN, part of the quality control process is to ensure any URLs in the package point are valid. While this requirement is sound, it can add considerably to check times since each URL takes around a second to check. There are around 70,000 URLs on CRAN that are checked currently, of which around 12,000 have a github.com domain (by far the most common domain, the next most common being doi.org with < 3000). I propose the QC process be slightly weakened to skip checks of URLs that point to a pull request or issue of a repository, provided the repository URL itself has been checked. This patch would skip around 5000 URLs. I claim that this would not actually weaken the quality control process in practice. While this patch would skip invalid URLs like https://github.com/<repo>/<package>/9999999999999, I think it is much more likely that a URL would point to the wrong issue or pull request, rather than one which does not exist. Since the current QC doesn't check whether a valid link is the intended page, my proposal would not be a real change in this regard. The patch should not affect the QC of packages with no github.com URLs at all. This change was motivated by a recent somewhat regrettable change to the data.table package. That particular package had over 500 such URLs in its NEWS file that took so long to check it choked the R CMD check process. As a result, the NEWS file was split, which avoided the checks but makes it harder to navigate historical changes. Best, Hugh Parsonage. Grattan Institute