Dirk Eddelbuettel
2018-Jan-30 21:19 UTC
[Rd] CRAN indices out of whack (for at least macOS)
I have received three distinct (non-)bug reports where someone claimed a recent package of mine was broken ... simply because the macOS binary was not there. Is there something wrong with the cronjob providing the indices? Why is it pointing people to binaries that do not exist? Concretely, file https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES contains Package: digest Version: 0.6.15 Title: Create Compact Hash Digests of R Objects Depends: R (>= 2.4.1) Suggests: knitr, rmarkdown Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix Archs: digest.so.dSYM yet the _same directory_ only has: digest_0.6.14.tgz 15-Jan-2018 21:36 157K I presume this is a temporary accident. We are all spoiled by you all providing such a wonderfully robust and well-oiled service---so again big THANKS for that--but today something is out of order. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Dirk Eddelbuettel
2018-Jan-31 18:34 UTC
[Rd] CRAN indices out of whack (for at least macOS)
Bumping this as we now have two more issue tickets filed and a fresh SO question. Is anybody looking at this? Simon? Dirk On 30 January 2018 at 15:19, Dirk Eddelbuettel wrote: | | I have received three distinct (non-)bug reports where someone claimed a | recent package of mine was broken ... simply because the macOS binary was not | there. | | Is there something wrong with the cronjob providing the indices? Why is it | pointing people to binaries that do not exist? | | Concretely, file | | https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES | | contains | | Package: digest | Version: 0.6.15 | Title: Create Compact Hash Digests of R Objects | Depends: R (>= 2.4.1) | Suggests: knitr, rmarkdown | Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix | Archs: digest.so.dSYM | | yet the _same directory_ only has: | | digest_0.6.14.tgz 15-Jan-2018 21:36 157K | | I presume this is a temporary accident. | | We are all spoiled by you all providing such a wonderfully robust and | well-oiled service---so again big THANKS for that--but today something is out | of order. | | Dirk | | -- | http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org | | ______________________________________________ | R-devel at r-project.org mailing list | https://stat.ethz.ch/mailman/listinfo/r-devel -- http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Dirk, yes, thanks, the edge server that serves the Mac binaries to CRAN has run out of disk space (due to size of CRAN itself) so the sync was incomplete. It is fixed now -- you can try by using the macos master server as mirror: https://r.research.att.com/ and it will propagate through other mirrors as usual. Thanks, Simon> On Jan 31, 2018, at 1:34 PM, Dirk Eddelbuettel <edd at debian.org> wrote: > > > Bumping this as we now have two more issue tickets filed and a fresh SO > question. > > Is anybody looking at this? Simon? > > Dirk > > On 30 January 2018 at 15:19, Dirk Eddelbuettel wrote: > | > | I have received three distinct (non-)bug reports where someone claimed a > | recent package of mine was broken ... simply because the macOS binary was not > | there. > | > | Is there something wrong with the cronjob providing the indices? Why is it > | pointing people to binaries that do not exist? > | > | Concretely, file > | > | https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES > | > | contains > | > | Package: digest > | Version: 0.6.15 > | Title: Create Compact Hash Digests of R Objects > | Depends: R (>= 2.4.1) > | Suggests: knitr, rmarkdown > | Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix > | Archs: digest.so.dSYM > | > | yet the _same directory_ only has: > | > | digest_0.6.14.tgz 15-Jan-2018 21:36 157K > | > | I presume this is a temporary accident. > | > | We are all spoiled by you all providing such a wonderfully robust and > | well-oiled service---so again big THANKS for that--but today something is out > | of order. > | > | Dirk > | > | -- > | http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org > | > | ______________________________________________ > | R-devel at r-project.org mailing list > | https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org >
Although it may not have been the cause of this particular index inconsistency, there are other causes of intermittent index inconsistencies. They could be avoided if there were a different directory structure on CRAN servers. One of the causes of inconsistencies is caching. With cloud.r-project.org (note that this is not cran.r-project.org), the there is a CDN in front of the server; the CDN has caching endpoints around the world, and will serve files to the user from the nearest endpoint. The cache timeout for each file is 30 minutes. Suppose a user downloads file X from some endpoint at 1:00. If the endpoint doesn't already have X in the cache, then it will fetch the file from the server, and then send it to the user. The endpoint will consider the cached file valid until 1:30. If another user requests X at 1:20, the endpoint will serve up the file from its cache without checking with the server. If someone requests X at 1:40, the endpoint will check with the server to see if its cached version is still valid (and download an updated version if necessary), then it wills end the file to the user. Because the caching is on a per-file basis, this can lead to a situation where the PACKAGES file served by an endpoint is out of sync with the .tgz package files. Imagine this scenario: 1:00 Someone downloads PACKAGES. It is not yet in the endpoint's cache, so it fetches it from the server. This version of PACKAGES says that the current version of PkgA is 1.0. 1:10 The server performs an rsync from the central CRAN mirror. It gets an updated version of PACKAGES, which says that the current version of PkgA is 2.0. The rsync also removes the PkgA_1.0.tgz file and adds PkgA_2.0.tgz. 1:20 Someone else wants to install PkgA, so their R session first downloads PACKAGES, which points to PkgA_1.0.tgz. Then R tries to download PkgA_1.0.tgz; it is not in the endpoint's cache, so the endpoint tries to fetch it from the server, but the file is not present there so it sends a 404 missing message. The endpoint passes this to the R session, and the package installation fails. Anyone else who tries to install PkgA (and hits the same CDN endpoint) will get the same installation failure, until the cache for PACKAGES expires at 1:30. However, another person who happens to hit another endpoint may be able to install PkgA, because each endpoint does its caching independently. Something similar even without a CDN, because download.packages() caches the contents of PACKAGES. However, that can be worked around by telling download.packages() to not use the cache, or by simply restarting R. One reason that package installations fail in these cases is that the current version of a package is in one directory, and the old (archived) versions of a package are in another directory. If current and old versions were in the same directory, then package installation would not fail. -Winston On Tue, Jan 30, 2018 at 1:19 PM, Dirk Eddelbuettel <edd at debian.org> wrote:> > I have received three distinct (non-)bug reports where someone claimed a > recent package of mine was broken ... simply because the macOS binary was not > there. > > Is there something wrong with the cronjob providing the indices? Why is it > pointing people to binaries that do not exist? > > Concretely, file > > https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES > > contains > > Package: digest > Version: 0.6.15 > Title: Create Compact Hash Digests of R Objects > Depends: R (>= 2.4.1) > Suggests: knitr, rmarkdown > Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix > Archs: digest.so.dSYM > > yet the _same directory_ only has: > > digest_0.6.14.tgz 15-Jan-2018 21:36 157K > > I presume this is a temporary accident. > > We are all spoiled by you all providing such a wonderfully robust and > well-oiled service---so again big THANKS for that--but today something is out > of order. > > Dirk > > -- > http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Thierry Onkelinx
2018-Feb-05 10:31 UTC
[Rd] CRAN indices out of whack (for at least macOS)
Another benefit of Winston's proposal is that it make it easy to install specific package versions from source. For the time being I'm using a construct like https://github.com/inbo/Rstable/blob/master/cran_install.sh to generate a Docker image. Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// 2018-02-03 20:31 GMT+01:00 Winston Chang <winstonchang1 at gmail.com>:> Although it may not have been the cause of this particular index > inconsistency, there are other causes of intermittent index > inconsistencies. They could be avoided if there were a different > directory structure on CRAN servers. > > One of the causes of inconsistencies is caching. With > cloud.r-project.org (note that this is not cran.r-project.org), the > there is a CDN in front of the server; the CDN has caching endpoints > around the world, and will serve files to the user from the nearest > endpoint. > > The cache timeout for each file is 30 minutes. Suppose a user > downloads file X from some endpoint at 1:00. If the endpoint doesn't > already have X in the cache, then it will fetch the file from the > server, and then send it to the user. The endpoint will consider the > cached file valid until 1:30. If another user requests X at 1:20, the > endpoint will serve up the file from its cache without checking with > the server. If someone requests X at 1:40, the endpoint will check > with the server to see if its cached version is still valid (and > download an updated version if necessary), then it wills end the file > to the user. > > Because the caching is on a per-file basis, this can lead to a > situation where the PACKAGES file served by an endpoint is out of sync > with the .tgz package files. Imagine this scenario: > > 1:00 Someone downloads PACKAGES. It is not yet in the endpoint's > cache, so it fetches it from the server. This version of PACKAGES says > that the current version of PkgA is 1.0. > 1:10 The server performs an rsync from the central CRAN mirror. It > gets an updated version of PACKAGES, which says that the current > version of PkgA is 2.0. The rsync also removes the PkgA_1.0.tgz file > and adds PkgA_2.0.tgz. > 1:20 Someone else wants to install PkgA, so their R session first > downloads PACKAGES, which points to PkgA_1.0.tgz. Then R tries to > download PkgA_1.0.tgz; it is not in the endpoint's cache, so the > endpoint tries to fetch it from the server, but the file is not > present there so it sends a 404 missing message. The endpoint passes > this to the R session, and the package installation fails. > > Anyone else who tries to install PkgA (and hits the same CDN endpoint) > will get the same installation failure, until the cache for PACKAGES > expires at 1:30. However, another person who happens to hit another > endpoint may be able to install PkgA, because each endpoint does its > caching independently. > > Something similar even without a CDN, because download.packages() > caches the contents of PACKAGES. However, that can be worked around by > telling download.packages() to not use the cache, or by simply > restarting R. > > One reason that package installations fail in these cases is that the > current version of a package is in one directory, and the old > (archived) versions of a package are in another directory. If current > and old versions were in the same directory, then package installation > would not fail. > > > -Winston > > > > On Tue, Jan 30, 2018 at 1:19 PM, Dirk Eddelbuettel <edd at debian.org> wrote: >> >> I have received three distinct (non-)bug reports where someone claimed a >> recent package of mine was broken ... simply because the macOS binary was not >> there. >> >> Is there something wrong with the cronjob providing the indices? Why is it >> pointing people to binaries that do not exist? >> >> Concretely, file >> >> https://cloud.r-project.org/bin/macosx/el-capitan/contrib/3.4/PACKAGES >> >> contains >> >> Package: digest >> Version: 0.6.15 >> Title: Create Compact Hash Digests of R Objects >> Depends: R (>= 2.4.1) >> Suggests: knitr, rmarkdown >> Built: R 3.4.3; x86_64-apple-darwin15.6.0; 2018-01-29 05:21:06 UTC; unix >> Archs: digest.so.dSYM >> >> yet the _same directory_ only has: >> >> digest_0.6.14.tgz 15-Jan-2018 21:36 157K >> >> I presume this is a temporary accident. >> >> We are all spoiled by you all providing such a wonderfully robust and >> well-oiled service---so again big THANKS for that--but today something is out >> of order. >> >> Dirk >> >> -- >> http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel