On Tue, 26 Aug 2003, Ross Boylan wrote:
> I have two questions about packaging up code.
>
> 1) Weave/tangle advisable?
> In the course of extending some C code already in S, I had to work out
> the underlying math. It seems to me useful to keep this information
> with the code, using Knuth's tangle/weave type tools. I know there is
> some support for this in R code, but my question is about the wisdow of
> doing this with C (or Fortran, or other source) code.
>
> Against the advantage of having the documentation and code nicely
> integrated are the drawbacks of added complexity in the build process
> and portability concerns. Some of this is mitigated by the existing
> dependence on TeX.
There is none. We don't assume a working latex/tex, although some manuals
will not be produced without working (pdf)latex (or texinfo->pdf).
One quick comment: the pre-compiled packages (for Windows now and MacOS X
for the next release) are produced automatically without user
intervention. So if you want to have a package on CRAN, it needs to work
out of the box, and there is no dependence on TeX, let alone weave/tangle,
in the standard procedure.
> An intermediate approach would be to provide both the web (in the Knuth
> sense) source and the C output; the latter could be used directly by
> those not wishing to hassle with web. This isn't ideal, since the
> resulting C is likely to be a bit cryptic, and if someone edits the C
> without changing the web source confusion will reign.
>
> So do people have any thoughts about whether introducing this is a step
> forward or back?
A useful analogue: we now distribute Fortran code not the original Ratfor.
> 2) Modifications of existing packages.
> I modified the survival package (I'm not sure if that's properly
called
> a "base" package, but it's close). I know in this particular
case, if
It's a `recommended' package, as the DESCRIPTION file says. There is a
base package, and several standard packages bundled with R, which have
priority "base" and are often call `base packages'.
> I'm serious, I probably should contact the package maintainer. But
this
> kind of operation will probably be pretty common for me; I imagine many
> on this list have already done it. In general, is the best thing to do
> a) package the new routines as a small additional package, with a
> dependence on the base package if necessary (the particular change I've
> made actually produces a few distinct files, slight tweaks of existing
> ones, that can stand on their own)
> b) package the new things in with the old under the same name as the old
> (obviously requires working with package maintainter)
> c) package the new things with the old and give it a new name.
>
> I'm also curious about what development strategy is best; I did b), and
> it seemed to work OK. But I kept expecting it to cause disaster (it
> probably helped that I usually didn't load the baseline survival
> packages; clearly that wouldn't be an option if working with one of the
> automatically loaded packages).
I think a) is the best, including changing the names of any R functions
you alter, and changing the entry points in any compiled code you alter.
Package maintainers may have very good reasons not to go along with b),
including their not being the original authors (true for survival),
workload, lack of interest in the proposed changes, complications of
ownership and copyright, ....
c) is I believe unwise. It may be allowed by the licence (or may not) but
in the couple of cases where I have seen it done it did not give anything
like adequate credit to the original authors (who were never consulted)
and the modified code distributed was out-of-date when originally
released, let alone now.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595