Dirk Eddelbuettel
2003-Dec-31 03:51 UTC
[Rd] RFC on first public draft of 'Debian R Policy'
r-devel and debian-devel readers: Below is a draft for a suggested policy for R packages within Debian. In the six years that we have been maintaining R for Debian, the total number of R related packages has grown to a full thirty -- eleven based on the main tarball released by R Core, as well as nineteen contributed packages -- reflecting the work of five different Debian maintainers. This draft document is concerned mostly with how we can assure the integrity of the contributed packages. This is a timely concern: R-based archives such as CRAN (cran.r-project.org) and BioConductor (www.bioconductor.org) are experiencing unprecendented growth in the number of their packages. More and more of these may eventually be turned into Debian packages. We would like to suggest some mechansisms to ensure consistency, similar to what Debian has achieved with add-on packages based on Perl, Python or Emacs Lisp packages. We're looking forward to your comments. It may be beneficial for the flow of the discussion to carbon-copy both mailing lists. With best regards, and Happy New Year, Dirk Eddelbuettel and Doug Bates Debian R Policy Draft Proposal - v 0.1.3 Dirk Eddelbuettel (edd@debian.org) and Douglas Bates (bates@debian.org) December 30, 2003 0. Introduction: The `r-base' package for Debian GNU/Linux (http://www.debian.org) has been available since December 1997 and provides R (http://www.r-project.org), a language and environment for statistical computing and graphics. Like Debian, R has grown considerably since then. The r-base package is now a small meta package that depends on several other Debian packages, including r-base-core which provides the essential parts of the R environment. The R system has its own concept of packages that provide base functionality and extensions for the language. The Debian r-base-core package installs 15 required R packages in the directory /usr/lib/R/library. The Debian package r-recommended installs another 13 R packages in the same directory. The creation of user-written R packages is explicitly encouraged by the R Core team. The format of R packages, as documented in the manual "Writing R Extensions", available in the Debian r-doc-html and r-doc-pdf packages, is loosely based on the Debian packaging format. CRAN (http://cran.r-project.org), the Comprehensive R Archive Network (a network of global mirrors), already contains well over 300 contributed R packages. CRAN is patterned after archive networks such as CPAN (http://www.cpan.org) and CTAN (http://www.ctan.org). While CRAN constitutes the principal source of R packages, many other R packages are available in specialized archives such as Bioconductor (http://www.bioconductor.org), Omegahat (http://www.omegahat.org), and Sourceforge (http://sourceforge.net) as well as private archives. One R package eminating from a private archive has already been released as the Debian package r-noncran-lindsey. Debian currently contains a total of nineteen add-on packages for R. The purpose of this document is to propose standards for creating Debian packages of R packages. 0.1 Terminology: The term "package" may mean either an R package or a Debian package. When necessary we will distinguish between these by using the full names "R package" and "Debian package". An R library is a file system directory that contains a collection of R packages. A search path of libraries is maintained during an R session. The library search path is initialized at startup from the environment variable `R_LIBS', which (if defined) should be a colon-separated list of directories at which R library trees are rooted. On Debian systems, R environment variables are typically only defined inside /usr/bin/R, but can be defined by the user if needed, as in the case of private libraries below $HOME. Starting with the Debian pre-releases of R 1.7.0, R_LIBS was set up for three distinct directories: user-installed packages are installed in /usr/local/lib/R/site-library, Debian packages will install into /usr/lib/R/site-library and the r-base-core and r-recommended packages will use the standard /usr/lib/R/library directory. See section 2.2 below for more details. As described in "Writing R Extensions", an R package should contain data sets and/or R code, and R help files in Rd (R documentation) format. Optionally an R package can contain C/C++/Fortran code, regression tests, and/or expository documents, called "vignettes", that are written in a noweb format. Source R packages are distributed as gzip'd tar files named according to the package, version and sub-version, e.g. car_1.0-0.tar.gz. In what follows, the R prompt, "> ", will be used to distinguish example commands typed to the R interpreter from those typed to the shell. 1.0 Installing R packages The primary executable installed by the Debian package r-base-core is the shell script /usr/bin/R. Invoked by itself it starts an R session. It is also the tool for managing R packages. A source package would be installed as ## install in the local system-wide packages library location ## note that the directory /usr/local/lib/R/site-library is now ## used automatically on Debian systems, see section 2.2 below R CMD INSTALL car_1.0-0.tar.gz or ## install in a private library ~/Rlibs R CMD car_1.0-0.tar.gz INSTALL -l ~/Rlibs When run on a computer connected to the Internet, the R system provides an interface to CRAN. One can install or update packages by > install.packages("car") > # update currently installed packages to the latest available > # version on CRAN > update.packages() Naturally, installation of an R package in a library directory requires write permission on the library directory. 1.1 Why provide Debian packages of R packages? One reason for providing a Debian package of an R package is to use Debian package dependencies to ensure that any system libraries or include files required to compile the code in the R package are available. For example, the Debian postgresql-dev package must be installed if the R package Rpgsql is to be installed successfully. The second reason is for convenience. Someone who already uses Debian tools such as apt-get to update the packages on a Debian system may find installing or updating a Debian package to be more convenient than installing the r-base Debian package plus learning to update R packages from within R or externally using R CMD INSTALL. Because R is beginning to be used more widely in fields such as in biology (e.g. Bioconductor) and social sciences, we should not count on the typical user being an R guru. Having R packages controlled by apt seems worth the small amount of overhead in creating the Debian packages. This also applies to systems maintained by (presumably non-R using) system administrators who may already be more familiar with Debian's package mechanism. By using this system to distribute CRAN packages, another learning curve is avoided for those who may not actually use R but simply provide it for others. The third reason is quality control. The CRAN team already goes to great length to ensure the individual quality and coherence of an R package. Embedding a binary R package in the Debian package management system provides additional control over dependencies between required compoments or libraries, as well as access to a fully automated system of `build daemons' that recompile a source package for up to ten other architectures -- which provides a good portability and quality control test. The fourth reason is scalability. More and more users are using several machines, or may need to share work with co-workers. Being able to create, distribute and install identical binary packages makes it easier to keep machines synchronised in order to provide similar environments. The fifth reason plays on Debian's strength as a common platform for other 'derived' systems. Examples are Knoppix and its derivatives such as Quantian. Providing Debian packages of R packages allows others to use these in new environments. 2.0 Proposed conventions for Debian packages of R packages. 2.1 Name and version number of the Debian package We propose that the Debian packages be named r-<Rarchive>-<Rpackage>. An R package from a private archive can use "noncran" for the archive name indicating that it did not come from CRAN. For example r-cran-car - the car package from CRAN r-bioconductor-affy - the affy (Affymetrix) package from Bioconductor.org r-omegahat-rgtk - the Gtk bindings package from Omegahat.org r-noncran-lindsey - a package from Jim Lindsey's private archive. This determines the name of the binary Debian package. The Debian source package can in most cases retain the <Rpackage> name. E.g., for the examples above one could use car, affy, rgtk and lindsey. This makes it consistent with the upstream archive: CRAN mirrors will have a current tar.gz file with sources for car, and so will Debian mirrors. As general rule, Debian package names have to be in lowercase, and any potential dots should be replaced with hyphens. If the potential for name-space collision with other packages is sufficient, then the binary package name can instead be used as the source package name. One new category 'other' may be introduced for packages not originating from CRAN, BioConductor or OmegaHat. We suggest to use the scheme r-other-$AUTHOR-$PACKAGE for these. For example, the package by Jim Lindsey could be re-released as a set of packages which would comprise 'r-other-lindsey-rmutil', 'r-other-lindsey-gnlm' and so on. Version numbers should allow for a final Debian revision to permit uploads independent of CRAN uploads. Given that CRAN packages use a scheme `a.b-c', adding a Debian revision d leads to scheme `a.b-c-d' which looks unusual due to the double hyphens. Alternatively, the hyphen in the CRAN version can be translated into a dot yielding `a.b.c-d'. Both formats are permitted. 2.2 Installation directory Only the r-base-core and r-recommended packages should install an R package into the library /usr/lib/R/library. Other Debian packages should install into the library /usr/lib/R/site-library. User-installed packages should go into /usr/local/lib/R/site-library. Using the default setting of R_LIBS=${R_LIBS-'/usr/local/lib/R/site-library:/usr/lib/R/site-library:/usr/lib/R/library'} ensures that user installations will go into /usr/local/lib/R/site-library as this is the first directory listed in R_LIBS. Debian packages will explicitly set the library /usr/lib/R/site-library in their debian/rules files (see below). Lastly, default Makefiles for R released by R Core will continue to use /usr/lib/R/library. One advantage of using separate libraries for separate archives is that the packages are listed according to library directories by R's library() function. Note that further directories should be added to R_LIBS only after at least some informal consultation with the Debian R maintainers. 2.3 Interaction with other package managers In the case of a Debian system, a possible conflict exists between the standard Debian way of managing packages with apt-get and friends, and the common R way of using 'update.packages()'. For example, a user may install a package via apt-get, but later call (as root) the update.packages() function from within R. This could invalidate the package information stored by dpkg. As a first line of defense, we will rely on common sense: mixing package management system cannot be construed as a good idea. As a second line of defense, it is anticipated that future R versions will add a new entry record 'Package-Manager: Debian' to the installed DESCRIPTION file of each package. This allows upgrade.packages() to at a minimum warn about an attempted upgrade of Debian-installed package, or to possibly even refuse to execute such an upgrade. On the other hand, such a mechanism still allows for direct installation from CRAN for packages that are not (yet ?) available as Debian packages. 3.0 Template The following section provides a template for the debian/ directory of a Debian package created from an R package. Medium-term, and with a modest amount of effort on our part, the process of creating a Debian package from an R package on CRAN could be automated to a large extent, leaving the possibility of eventually providing a tool like dh-make-r akin to the existing dh-make-perl which aids in creating Debian packages from CPAN modules. This has become a lot easier thank to the cdbs package and its ability to reduce a debian/rules file to a few lines. In fact, Albrecht Gebhardt at the University of Klagenfurt has already build a mostly complete infrastructure to do this, and we intent to eventually provide Debian package via apt-get'able section of the CRAN mirror network. Packaging for Debian has become fairly exhaustively documented, for example under the 'Packaging' section of http://www.debian.org/devel/ where the Debian Policy manual (http://www.debian.org/doc/debian-policy) as well as the Developers Reference (http://www.debian.org/doc/developers-reference) can be found. This section does not aim to be a substitute for these. Indeed, we recommend following the policy and reference manuals if they conflict with this section. 3.1 Overview Sources for an R package are normally supplied as a compressed tar archive. Converting this into a Debian package can be as easy as adding a directory debian/ to the main directory from the untarred archive. At a minimum, four files must be in the debian/ directory. One distinction (that has to be made at the beginning) relates to whether the package will be architecture dependent or independent. R packages that do not contain any C, C++ or Fortran source code are architecture independent, which is reflected in the package name ending in `_all.deb'. As such, they do not need to be rebuilt on different machine architectures by the Debian build system. On the other hand, packages containing source code that is to be compiled and linked must be rebuilt for different architectures. For R packages, the difference between the two setups concerns mostly the file debian/control. The file debian/rules would normally be affected too. However, in the case of R, the implicitly architecture-neutral way of building R packages via 'R CMD INSTALL' actually bridges between both types of packages and permits us to rely on a single template for both cases. 3.2 Discussion Below, we discuss each of the required files briefly along with an example from the Debian r-cran-car package of the car CRAN package. To illustrate an architecture-dependent package, we will also show one file from the r-cran-rodbc package. 3.2.1 debian/control 3.2.1.1 debian/control for an architecture-independent package This file contains metainformation about the package and is similar in spirit to the DESCRIPTION file of a R package. The control file typically contains two sections. The first section provides the name of the source package (i.e. name of the upstream tarball stripped of its version numbers) as well as the maintainer name and email. It also suggests both a section and priority for the Debian archive, and finally supplies the minimum set of packages (beyond the build-essential package) needed to build the package. Note that the field is called 'Build-Depends-Indep' as it lists the build requirements for an architecture-independent package, the car package. The second section describes the resulting package(s). For each package, it provides its name, the architecture (where 'all' signals that no recompilation on other hardware platforms is needed), a textual description as well as the dependencies -- in this case only R itself. ........................................................................... Source: car Section: math Priority: optional Maintainer: Dirk Eddelbuettel <edd@debian.org> Build-Depends-Indep: debhelper (>> 4.1.0), r-base-dev (>> 1.7.1), cdbs Standards-Version: 3.6.1.0 Package: r-cran-car Architecture: all Depends: r-base-core (>= 1.7.1) Description: GNU R Companion to Applied Regression by John Fox This package accompanies J. Fox, An R and S-PLUS Companion to Applied Regression, Sage, 2002. The package contains mostly functions for applied regression, linear models, and generalized linear models, with an emphasis on regression diagnostics, particularly graphical diagnostic methods. There are also some utility functions. ........................................................................... 3.2.1.2 debian/control for an architecture-dependent package For an architecture-dependent package, the setup is similar. The build dependencies are listed in a field 'Build-Depends'. This information is essential for the automated build infrastructure which compiles Debian packages for a variety of hardware platforms. The `Architecture: any' setting shows that the package needs to be rebuilt. Dynamic libraries from standard locations are automatically identified and inserted via the ${shlibs:Depends} pragma. As R uses a private directory for its dynamic library, a dependency on R has to be added explicitly. ........................................................................... Source: rodbc Section: math Priority: optional Maintainer: Dirk Eddelbuettel <edd@debian.org> Build-Depends: debhelper (>>4.1.0), cdbs, r-base-dev (>> 1.7.1), unixodbc-dev Standards-Version: 3.6.1.0 Package: r-cran-rodbc Architecture: any Depends: ${shlibs:Depends}, r-base-core (>= 1.7.1) Suggests: odbc-postgresql, libmyodbc Description: GNU R package for ODBC database access This CRAN package provides access to any Open DataBase Connectivity (ODBC) accessible database. . The package should be platform independent and provide access to any database for which a driver exists. It has been tested with MySQL and PostgreSQL on both Linux and Windows (and to those DBMSs on Linux hosts from R under Windows), Microsoft Access, SQL Server and Excel spreadsheets (read-only), and users have reported success with connections to Oracle and DBase. . Usage is covered in the R Data Import/Export manual (available via the r-doc-pdf, r-doc-html and r-doc-info packages). ........................................................................... 3.2.2 debian/copyright The copyright file typically consists of three sections. First, information about the package, its author and purpose are briefly stated. Second, the canonical source of the package is identified. Third, the copyright information is stated. As Debian adheres to the Debian Free Software Guidelines (http://www.debian.org/social_contract#guidelines), only software that matches this criteria can be added to the Debian archive. As CRAN follows a similar spirit, most R packages should be suitable but packagers of prospective R packages should be careful to ensure that the R package is DFSG-free. ........................................................................... This is the Debian GNU/Linux r-cran-car package of car, the Companion to Applied Regression package for GNU R. Car was written by John Fox. This package was created by Dirk Eddelbuettel <edd@debian.org>. The sources were downloaded from http://cran.us.r-project.org/src/contrib/ The package was renamed from its upstream name 'car' to 'r-cran-car' to fit the pattern of CRAN (and non-CRAN) packages for R. Car is copyright John Fox and released under the GNU General Public License (GPL). On a Debian GNU/Linux system, the GPL license is included in the file /usr/share/common-licenses/GPL. For reference, the upstream DESCRIPTION [with lines broken to 80 cols] file is included below: Package: car Version: 1.0-5 Date: 2003/5/26 Title: Companion to Applied Regression Author: John Fox <jfox@mcmaster.ca>. I am grateful to Douglas Bates, David Firth, Michael Friendly, Georges Monette, Brian Ripley, and Sanford Weisberg for various suggestions. Maintainer: John Fox <jfox@mcmaster.ca> Depends: R (>= 1.7.0), modreg Description: This package accompanies J. Fox, An R and S-PLUS Companion to Applied Regression, Sage, 2002. The package contains mostly functions for applied regression, linear models, and generalized linear models, with an emphasis on regression diagnostics, particularly graphical diagnostic methods. There are also some utility functions. With some exceptions, I have tried not to duplicate capabilities in the basic distribution of R, nor in widely used packages. Some of the functions in car will use functions in the MASS package, if it is present; the subsets function graphs objects produced by the regsubsets function in the leaps package. Where relevant, the functions in car are consistent with na.action = na.omit or na.exclude. License: GPL version 2 or newer URL: http://www.r-project.org, http:/www.socsci.mcmaster.ca/jfox/ ........................................................................... 3.2.3 debian/changelog This file details the changes made to the package. Command-line tools for adding to it, as well as a full-featured Emacs mode are available. ........................................................................... car (1.0.9-1) unstable; urgency=low * Upgraded to new upstream release * debian/rules: Minor update moving towards common cdbs file -- Dirk Eddelbuettel <edd@debian.org> Wed, 10 Dec 2003 20:52:29 -0600 car (1.0.8-1) unstable; urgency=low * New upstream release * debian/rules: Updated moving towards common cdbs file * debian/control: Increased Standards-Version to 3.6.1.0 -- Dirk Eddelbuettel <edd@debian.org> Mon, 13 Oct 2003 22:25:00 -0500 car (1.0.7-1) unstable; urgency=low * New upstream release * debian/control: Standards-Version increased to 3.6.0.1 * debian/rules: Rewritten build stage with 'R CMD INSTALL .' * debian/control: Build-Depends-Indep on r-base-dev (>> 1.7.1) -- Dirk Eddelbuettel <edd@debian.org> Sat, 23 Aug 2003 12:41:04 -0500 car (1.0.5-1) unstable; urgency=low * Initial Debian Release -- Dirk Eddelbuettel <edd@debian.org> Sat, 5 Jul 2003 13:48:44 -0500 ........................................................................... 3.2.4 debian/rules This file provides the nuts and bolts of the actual packaging. It is written in the GNU Make language, and often employs additional Debian tools such as debhelper. In a nutshell, it provides a framework for the common 'configure; make; make install; make clean' cycle of installing software. However, the installation actually happens to a subdirectory of the actual build directory. The resulting filesystem tree is then wrapped into a tarball which, along with control and metainformation as well as the possible pre- and post-installation and removal scripts, is placed into an ar archive ending in the .deb suffix. For our purposes, only a few key lines matter as R CMD INSTALL does all the R package building work. Moreover, as 'R CMD INSTALL' is invoked for both types of architecture dependent and independent, we have adapted debian/rules accordingly. By using the cdbs built system, we can in fact use the same debian/rules file a variety of packages. ........................................................................... #!/usr/bin/make -f # -*- makefile -*- # debian/rules file for the Debian/GNU Linux r-cran-car package # Copyright 2003 by Dirk Eddelbuettel <edd@debian.org> include /usr/share/cdbs/1/rules/debhelper.mk include /usr/share/cdbs/1/class/langcore.mk ## We need the CRAN (upstream) name cranName := $(shell grep Package: DESCRIPTION | cut -f2 -d" ") ## and we need to build a Debian Policy-conformant lower-case package name cranNameLC := $(shell echo $(cranName) | tr "[A-Z]" "[a-z]" | tr "." "-" ) ## which we can use to build the package directory package := r-cran-$(cranNameLC) ## which we use for the to-be-installed-in directory debRlib :=$(CURDIR)/debian/$(package)/usr/lib/R/site-library common-install-indep:: R_any_arch common-install-arch:: R_any_arch R_any_arch: dh_installdirs usr/lib/R/site-library R CMD INSTALL -l $(debRlib) --clean . rm -vf $(debRlib)/R.css $(debRlib)/$(cranNameLC)/COPYING ........................................................................... The 'package' variable would need to change from `r-cran-$(cranNameLC)' to `r-omegahat-$(cranNameLC)' for a package from Omegahat.org, and similarly for BioConductor. A currently open question is the desire to also run 'R CMD check' at build time. However, as of R 1.8.1, this requires a minor upstream change in the R tools to allow the build to proceed from directories also containing a version number in their name. We expect to add this feature in one of the next R releases. 3.3 Putting it all together Invoking 'dpkg-buildpackage -rfakeroot -us -uc' from inside the top-level directory of an expanded CRAN package will build the Debian package, along with .dsc. .diff.gz and .changes files (see the Debian Policy manual and the Developer Reference for more details). The resulting .changes file can then used for a package check via the lintian tool. 4. Acknowledgements Comments and suggestions by Albrecht Gebhard, Frank Harrell, Kurt Hornik, Rafael Laboissiere, Friedrich Leisch, Steffen Moeller and Tony Rossini are gratefully acknowledged. $Id: R-packages.txt,v 1.6 2003/12/31 02:30:21 edd Exp $ -- The relationship between the computed price and reality is as yet unknown. -- From the pac(8) manual page
Bernhard R. Link
2003-Dec-31 12:53 UTC
[Rd] Re: RFC on first public draft of 'Debian R Policy'
* Dirk Eddelbuettel <edd@debian.org> [031231 04:10]:> Debian R Policy > Draft Proposal - v 0.1.3[...]> > December 30, 2003[...]> The R system has its own concept of packages that provide base functionality > and extensions for the language. The Debian r-base-core package installs 15 > required R packages in the directory /usr/lib/R/library. The Debian package > r-recommended installs another 13 R packages in the same directory.This is just some small suggestion: Having explicit numbers without any qualifier will demand you to adopt these with any new revision. Some phrase like "currently (as in Dec 2003)" might cure some future overlooks.> 3.2.2 debian/copyright > > The copyright file typically consists of three sections. First, > information about the package, its author and purpose are briefly > stated. Second, the canonical source of the package is identified. Third, > the copyright information is stated. As Debian adheres to the Debian Free > Software Guidelines (http://www.debian.org/social_contract#guidelines), > only software that matches this criteria can be added to the Debian > archive. As CRAN follows a similar spirit, most R packages should be > suitable but packagers of prospective R packages should be careful to > ensure that the R package is DFSG-free.Here could be added the tip to ask debian-legal, if the licence has not be decided upon. It sometimes confused people that DFSG and OSD are so similar but are sometimes differently interpreted.> ........................................................................... > This is the Debian GNU/Linux r-cran-car package of car, the Companion to > Applied Regression package for GNU R. Car was written by John Fox. > > This package was created by Dirk Eddelbuettel <edd@debian.org>. > The sources were downloaded from > http://cran.us.r-project.org/src/contrib/[...]> Car is copyright John Fox and released under the GNU General Public License > (GPL). > > On a Debian GNU/Linux system, the GPL license is included in the file > /usr/share/common-licenses/GPL. >[...]> Author: John Fox <jfox@mcmaster.ca>. I am grateful to Douglas Bates, > David Firth, Michael Friendly, Georges Monette, Brian Ripley, and > Sanford Weisberg for various suggestions. > Maintainer: John Fox <jfox@mcmaster.ca>[...]> License: GPL version 2 or newer > URL: http://www.r-project.org, http:/www.socsci.mcmaster.ca/jfox/ > ...........................................................................I think here a bit better example could be nicer. While the information can be found in the later trunk, some note would be nice, that the copyright file should list: - anyone owning copyright (not only upstream author, though that may be the same in most cases) best together with some way to contact him (i.e. the e-mail address given). - The exact license given. The "version 2" is important information and the "or newer" could becaume important in the future. Some people also demand some form with "Copyright", and C in an circle and a year of creation as information needed by some historic international treaty, but I think that is even historic in the US. Hochachtungsvoll, Bernhard R. Link -- Sendmail is like emacs: A nice operating system, but missing an editor and a MTA.