I am testing out the next release of survival, which involves running R CMD check on 868 CRAN packages that import, depend or suggest it. The survival package has a lot of data sets, most of which are non-trivial real examples (something I'm proud of).? To save space I've bundled many of them, .e.g., data/cancer.rda has 19 different dataframes. This caused failures in 4 packages, each because they have a line such as "data(lung)"? or data(breast, package= "survival"); and the data() command looks for a file name. This is a question about which option is considered the best (perhaps more of a poll), between two choices 1. unbundle them again? (it does save 1/3 of the space, and I do get complaints from R CMD build about size) 2. send notes to the 4 maintainers.? The help files for the data sets have the usage documented as? "lung" or "breast", and not data(lung), so I am technically legal to claim they have a mistake. A third option to make the data sets a separate package is not on the table.? I use them heavily in my help files and test suite, and since survival is a recommended package I can't add library(x) statements for? !(x %in% recommended).?? I am guessing that this would also break many dependent packages. Terry T. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic therneau at mayo.edu "TERR-ree THUR-noh" [[alternative HTML version deleted]]
Dear Terry Option 2 looks the best to me. They have a relatively simple change to make and there are only four of them. Michael On 16/02/2021 14:39, Therneau, Terry M., Ph.D. via R-devel wrote:> I am testing out the next release of survival, which involves running R CMD check on 868 > CRAN packages that import, depend or suggest it. > > The survival package has a lot of data sets, most of which are non-trivial real examples > (something I'm proud of).? To save space I've bundled many of them, .e.g., data/cancer.rda > has 19 different dataframes. > > This caused failures in 4 packages, each because they have a line such as "data(lung)"? or > data(breast, package= "survival"); and the data() command looks for a file name. > > This is a question about which option is considered the best (perhaps more of a poll), > between two choices > > 1. unbundle them again? (it does save 1/3 of the space, and I do get complaints from R CMD > build about size) > 2. send notes to the 4 maintainers.? The help files for the data sets have the usage > documented as? "lung" or "breast", and not data(lung), so I am technically legal to claim > they have a mistake. > > A third option to make the data sets a separate package is not on the table.? I use them > heavily in my help files and test suite, and since survival is a recommended package I > can't add library(x) statements for? !(x %in% recommended).?? I am guessing that this > would also break many dependent packages. > > Terry T. >-- Michael http://www.dewey.myzen.co.uk/home.html
I would recommend option 2. I have done that when changes to xtable broke some packages. xtable has a number of dependencies but not on the scale of survival. Just 4 packages out of 868 seems minimal to me. David Scott On 17/02/2021 3:39 am, Therneau, Terry M., Ph.D. via R-devel wrote: I am testing out the next release of survival, which involves running R CMD check on 868 CRAN packages that import, depend or suggest it. The survival package has a lot of data sets, most of which are non-trivial real examples (something I'm proud of). To save space I've bundled many of them, .e.g., data/cancer.rda has 19 different dataframes. This caused failures in 4 packages, each because they have a line such as "data(lung)" or data(breast, package= "survival"); and the data() command looks for a file name. This is a question about which option is considered the best (perhaps more of a poll), between two choices 1. unbundle them again (it does save 1/3 of the space, and I do get complaints from R CMD build about size) 2. send notes to the 4 maintainers. The help files for the data sets have the usage documented as "lung" or "breast", and not data(lung), so I am technically legal to claim they have a mistake. A third option to make the data sets a separate package is not on the table. I use them heavily in my help files and test suite, and since survival is a recommended package I can't add library(x) statements for !(x %in% recommended). I am guessing that this would also break many dependent packages. Terry T. -- Terry M Therneau, PhD Department of Health Science Research Mayo Clinic therneau at mayo.edu<mailto:therneau at mayo.edu> "TERR-ree THUR-noh" [[alternative HTML version deleted]] ______________________________________________ R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel> -- _________________________________________________________________ David Scott Department of Statistics The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Email: d.scott at auckland.ac.nz<mailto:d.scott at auckland.ac.nz> [[alternative HTML version deleted]]