No attempt to summarize the thread, but a few highlighted points:
o Karl's suggestion of versioned / dated access to the repo by adding a
layer to webaccess is (as usual) nice. It works on the 'supply'
side. But
Jeroen's problem is on the demand side. Even when we know that an
analysis was done on 20xx-yy-zz, and we reconstruct CRAN that day, it only
gives us a 'ceiling' estimate of what was on the machine. In
production
or lab environments, installations get stale. Maybe packages were already
a year old? To me, this is an issue that needs to be addressed on the
'demand' side of the user. But just writing out version numbers is
not
good enough.
o Roger correctly notes that R scripts and packages are just one issue.
Compilers, libraries and the OS matter. To me, the natural approach these
days would be to think of something based on Docker or Vagrant or (if you
must, VirtualBox). The newer alternatives make snapshotting very cheap
(eg by using Linux LXC). That approach reproduces a full environemnt as
best as we can while still ignoring the hardware layer (and some readers
may recall the infamous Pentium bug of two decades ago).
o Reproduciblity will probably remain the responsibility of study
authors. If an investigator on a mega-grant wants to (or needs to) freeze,
they do have the tools now. Requiring the need of a few to push work on
those already overloaded (ie CRAN) and changing the workflow of everybody
is a non-starter.
o As Terry noted, Jeroen made some strong claims about exactly how flawed
the existing system is and keeps coming back to the example of 'a JSS
paper that cannot be re-run'. I would really like to see empirics on
this. Studies of reproducibility appear to be publishable these days, so
maybe some enterprising grad student wants to run with the idea of
actually _testing_ this. We maybe be above Terry's 0/30 and nearer to
Kevin's 'low'/30. But let's bring some data to the debate.
o Overall, I would tend to think that our CRAN standards of releasing with
tests, examples, and checks on every build and release already do a much
better job of keeping things tidy and workable than in most if not all
other related / similar open source projects. I would of course welcome
contradictory examples.
Dirk
--
Dirk Eddelbuettel | edd at debian.org | http://dirk.eddelbuettel.com