Daniel Kelley
2014-Jan-28 23:49 UTC
[Rd] how to unbreak a circular package dependence (S4 class data)
I have an issue with a circular package dependence that prevents building/checking, and I seek advice on breaking the circle so the packages can pass the build-check tests that are required for CRAN submission. The package pair I'm working with is slow to build, but my tests suggest the issue may be general, and so I will explain it in general terms. Suppose there are two packages: 1. Foo, a package that defines some data types with S4 classes. 2. Foodata, a package that provides such datasets, for use by Foo. With this setup, it seems reasonable that Foo "depends" on Foodata, so the data can be used in Foo and its documentation. Since the data within Foodata are S4 classes as defined in Foo, an attempt to build-check Foodata will produce an error unless Foo is present. But Foo cannot be built unless Foodata exists, since it depends on it. Thus neither Foo nor Foodata can be built and checked. One solution would be to wrap the Foo documentation examples (and relevant Foo code) in require() blocks, and to make Foo "suggest" Foodata, not "depend” upon it. My question is whether this is the recommended practice, or the common practice. Thanks in advance to anyone who wishes to offer hints. PS. The problem arose from an attempt to reduce CRAN load by extracting the datasets that had been contained within a previous version of Foo. PPS. my (slow-building) packages are on github and I can supply details if needed. Dan E. Kelley Professor, Oceanography Department Dalhousie University, Canada Dan.Kelley@Dal.CA<mailto:Dan.Kelley@Dal.CA> [[alternative HTML version deleted]]
Kasper Daniel Hansen
2014-Jan-29 00:43 UTC
[Rd] how to unbreak a circular package dependence (S4 class data)
This question is quite common in Bioconductor because of the extensive use of S4 and because our data are often too big to stay within the size requirements on software packages (we separate packages into software and data, with size limits (5MB total size of final source tar ball) on software, but not data). The solution we use is to let Foo suggest Foodata and then wrap every example into if(require(Foodata)) { CODE } This is exactly one of the possibilities you mention in your post. As I see it, Foodata has to Depends on Foo because it has data defined using the classes in Foo. R-exts 1.1.3 says (about the Suggests field) "The 'Suggests' field uses the same syntax as 'Depends' and lists packages that are not necessarily needed. This includes packages used only in examples, tests or vignettes". Bioc packages I have authored which follows this setup are minfi/minfiData bsseq/bsseqData but there are other examples by other authors (which I cannot recall on top of my head). Best, Kasper On Tue, Jan 28, 2014 at 6:49 PM, Daniel Kelley <Dan.Kelley@dal.ca> wrote:> I have an issue with a circular package dependence that prevents > building/checking, and I seek advice on breaking the circle so the packages > can pass the build-check tests that are required for CRAN submission. > > The package pair I'm working with is slow to build, but my tests suggest > the issue may be general, and so I will explain it in general terms. > > Suppose there are two packages: > > 1. Foo, a package that defines some data types with S4 classes. > > 2. Foodata, a package that provides such datasets, for use by Foo. > > With this setup, it seems reasonable that Foo "depends" on Foodata, so the > data can be used in Foo and its documentation. > > Since the data within Foodata are S4 classes as defined in Foo, an attempt > to build-check Foodata will produce an error unless Foo is present. But > Foo cannot be built unless Foodata exists, since it depends on it. Thus > neither Foo nor Foodata can be built and checked. > > One solution would be to wrap the Foo documentation examples (and relevant > Foo code) in require() blocks, and to make Foo "suggest" Foodata, not > "depend" upon it. My question is whether this is the recommended practice, > or the common practice. > > Thanks in advance to anyone who wishes to offer hints. > > PS. The problem arose from an attempt to reduce CRAN load by extracting > the datasets that had been contained within a previous version of Foo. > > PPS. my (slow-building) packages are on github and I can supply details if > needed. > > Dan E. Kelley > Professor, Oceanography Department > Dalhousie University, Canada > Dan.Kelley@Dal.CA<mailto:Dan.Kelley@Dal.CA> > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >[[alternative HTML version deleted]]
Hervé Pagès
2014-Jan-29 01:01 UTC
[Rd] how to unbreak a circular package dependence (S4 class data)
Hi Daniel, On 01/28/2014 03:49 PM, Daniel Kelley wrote:> I have an issue with a circular package dependence that prevents building/checking, and I seek advice on breaking the circle so the packages can pass the build-check tests that are required for CRAN submission. > > The package pair I'm working with is slow to build, but my tests suggest the issue may be general, and so I will explain it in general terms. > > Suppose there are two packages: > > 1. Foo, a package that defines some data types with S4 classes. > > 2. Foodata, a package that provides such datasets, for use by Foo. > > With this setup, it seems reasonable that Foo "depends" on Foodata, so the data can be used in Foo and its documentation. > > Since the data within Foodata are S4 classes as defined in Foo, an attempt to build-check Foodata will produce an error unless Foo is present. But Foo cannot be built unless Foodata exists, since it depends on it. Thus neither Foo nor Foodata can be built and checked.I've learned by experience that it's generally better (although not always possible) to avoid putting serialized S4 objects in a data package. They will break if you need to modify a little bit the internals of the class (and chances are high that you will at some point). Better to store the data in a format that is more or less guaranteed to remain the same for years (SQLite, XML, hdf5, plain text, serialized data frame, SAM/BAM, etc...) and try to come up with a fast way to load and turn the data into an S4 object on demand. Not always possible if the data is huge... but for the purpose of using it in Foo's examples and vignette do you really need huge data? Another advantage of this approach is that the data can then be more easily shared because it can be accessed with tools other than yours, e.g. tools that don't know about S4 and even non-R tools. Cheers, H.> > One solution would be to wrap the Foo documentation examples (and relevant Foo code) in require() blocks, and to make Foo "suggest" Foodata, not "depend? upon it. My question is whether this is the recommended practice, or the common practice. > > Thanks in advance to anyone who wishes to offer hints. > > PS. The problem arose from an attempt to reduce CRAN load by extracting the datasets that had been contained within a previous version of Foo. > > PPS. my (slow-building) packages are on github and I can supply details if needed. > > Dan E. Kelley > Professor, Oceanography Department > Dalhousie University, Canada > Dan.Kelley at Dal.CA<mailto:Dan.Kelley at Dal.CA> > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319