I found an issue with the data() command this evening when working on the survival package. 1. I have a lot of data sets in the package, almost all used in at least one vignette, help file, or test.? As a space saving measure, I have bundled many of them together, i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting file (using xz compression) is quite a bit smaller than the individual ones.? (I still get a warning note about size from R CMD check, but I'm no longer 2x the limit.) 2. Consider the lung data set.? All of these fail: ?? data(lung) ?? data("lung") ?? data(lung, package="survival") ?a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check.? (Several other .Rd files as well.) ?b. In broader examples for teaching, I sometimes load data from other packages, e.g data(aidssi, package="mstate").? But this does not work for survival.? (The larger survival data sets that are in separate .rda files can be found.) ?c. What does work is survival::lung.? Might it be useful to add a comment to data.Rd to this effect? 3. Creating a separate package 'survivaldata' is of course one route, and is suggested in the "Writing R Extensions" guide.? But this is not possible since survival is a recommended package: it can't load any non-recommended package for it's tests or vignettes.? Longer term, perhaps there is way around this constraint? Terry T. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic therneau at mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]]
On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote:> I found an issue with the data() command this evening when working on the survival package. > > 1. I have a lot of data sets in the package, almost all used in at least one vignette, > help file, or test.? As a space saving measure, I have bundled many of them together, > i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting > file (using xz compression) is quite a bit smaller than the individual ones.? (I still get > a warning note about size from R CMD check, but I'm no longer 2x the limit.) > > 2. Consider the lung data set.? All of these fail: > ?? data(lung) > ?? data("lung") > ?? data(lung, package="survival") > > ?a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check. > (Several other .Rd files as well.) > > ?b. In broader examples for teaching, I sometimes load data from other packages, e.g > data(aidssi, package="mstate").? But this does not work for survival.? (The larger > survival data sets that are in separate .rda files can be found.) > > ?c. What does work is survival::lung.? Might it be useful to add a comment to data.Rd to > this effect?You don't describe how this dataset is being included in your package. Have you moved it from data/lung.rda to data/cancer.rda? Currently (in survival 3.2-7) each of these works for me: library(survival); data(lung) library(survival); data("lung") # Without library(survival): data(lung, package="survival") I think if the lung dataset is now being included in cancer.rda, you'd need data(cancer, package="survival") or equivalent to load it (and the rest of the datasets there).> > > 3. Creating a separate package 'survivaldata' is of course one route, and is suggested in > the "Writing R Extensions" guide.? But this is not possible since survival is a > recommended package: it can't load any non-recommended package for it's tests or > vignettes.? Longer term, perhaps there is way around this constraint?Maybe the solution is to put your datasets into the "datasets" package, or make "survivaldata" a recommended package, or just leave things as they are and ignore the warnings about package size. I think that's a negotiation you should have with R Core. Duncan Murdoch
On 24 October 2020 at 05:28, Duncan Murdoch wrote: | they are and ignore the warnings about package size. I think that's a | negotiation you should have with R Core. s/R Core/CRAN/ ? Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org
Duncan and others:? I was not being careful with my description.? This concerned tests of version 3.2-8, not yet on CRAN, in which I was trying some size-limiting measures.?? My apologies for not making this clear. ? - I feel mild pressure to make the survival package smaller, per CRAN guidelines, and shrinking the data appears to be one way to approach that.? So a real point of the query is my attempts to do so.?? (I am much more resistant to shrinking the extensive test suite or the vignettes.) ? -? The survival package has a lot of small data sets, and bundling them up into a single .rda file does save space, but it causes some issues with data().?? The overall tarball goes from 7480 to 6100 in size (ls -s). ? Terry On 10/24/20 4:28 AM, Duncan Murdoch wrote:> On 23/10/2020 9:25 p.m., Therneau, Terry M., Ph.D. via R-devel wrote: >> I found an issue with the data() command this evening when working on the survival >> package. >> >> 1. I have a lot of data sets in the package, almost all used in at least one vignette, >> help file, or test.? As a space saving measure, I have bundled many of them together, >> i.e., the file data/cancer.rda contains 19 data sets, many of them small. The resulting >> file (using xz compression) is quite a bit smaller than the individual ones.? (I still get >> a warning note about size from R CMD check, but I'm no longer 2x the limit.) >> >> 2. Consider the lung data set.? All of these fail: >> ? ?? data(lung) >> ? ?? data("lung") >> ? ?? data(lung, package="survival") >> >> ? ?a. The lung.Rd file had \usage{data(lung)}; that error was not caught by R CMD check. >> (Several other .Rd files as well.) >> >> ? ?b. In broader examples for teaching, I sometimes load data from other packages, e.g >> data(aidssi, package="mstate").? But this does not work for survival.? (The larger >> survival data sets that are in separate .rda files can be found.) >> >> ? ?c. What does work is survival::lung.? Might it be useful to add a comment to data.Rd to >> this effect? > > You don't describe how this dataset is being included in your package. Have you moved it > from data/lung.rda to data/cancer.rda? Currently (in survival 3.2-7) each of these works > for me: > > ?library(survival); data(lung) > > ?library(survival); data("lung") > > ?# Without library(survival): > ?data(lung, package="survival") > > I think if the lung dataset is now being included in cancer.rda, you'd need > > ? data(cancer, package="survival") > > or equivalent to load it (and the rest of the datasets there). > >> >> >> 3. Creating a separate package 'survivaldata' is of course one route, and is suggested in >> the "Writing R Extensions" guide.? But this is not possible since survival is a >> recommended package: it can't load any non-recommended package for it's tests or >> vignettes.? Longer term, perhaps there is way around this constraint? > > Maybe the solution is to put your datasets into the "datasets" package, or make > "survivaldata" a recommended package, or just leave things as they are and ignore the > warnings about package size.? I think that's a negotiation you should have with R Core. > > Duncan Murdoch[[alternative HTML version deleted]]