thr3ads.net - R devel - [Rd] issue with data() [Feb 2021]

If this information is useful, please help other people find it:
Share via:

Therneau, Terry M., Ph.D.

2021-Feb-16 14:39 UTC

[Rd] issue with data()

I am testing out the next release of survival, which involves running R CMD
check on 868
CRAN packages that import, depend or suggest it.

The survival package has a lot of data sets, most of which are non-trivial real
examples
(something I'm proud of).? To save space I've bundled many of them,
.e.g., data/cancer.rda
has 19 different dataframes.

This caused failures in 4 packages, each because they have a line such as
"data(lung)"? or
data(breast, package= "survival"); and the data() command looks for a
file name.

This is a question about which option is considered the best (perhaps more of a
poll),
between two choices

1. unbundle them again? (it does save 1/3 of the space, and I do get complaints
from R CMD
build about size)
2. send notes to the 4 maintainers.? The help files for the data sets have the
usage
documented as? "lung" or "breast", and not data(lung), so I
am technically legal to claim
they have a mistake.

A third option to make the data sets a separate package is not on the table.? I
use them
heavily in my help files and test suite, and since survival is a recommended
package I
can't add library(x) statements for? !(x %in% recommended).?? I am guessing
that this
would also break many dependent packages.

Terry T.

-- 
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
therneau at mayo.edu

"TERR-ree THUR-noh"


	[[alternative HTML version deleted]]

Michael Dewey

2021-Feb-16 16:20 UTC

head link

[Rd] issue with data()

Dear Terry

Option 2 looks the best to me. They have a relatively simple change to 
make and there are only four of them.

Michael

On 16/02/2021 14:39, Therneau, Terry M., Ph.D. via R-devel
wrote:> I am testing out the next release of survival, which involves running R CMD
check on 868
> CRAN packages that import, depend or suggest it.
> 
> The survival package has a lot of data sets, most of which are non-trivial
real examples
> (something I'm proud of).? To save space I've bundled many of them,
.e.g., data/cancer.rda
> has 19 different dataframes.
> 
> This caused failures in 4 packages, each because they have a line such as
"data(lung)"? or
> data(breast, package= "survival"); and the data() command looks
for a file name.
> 
> This is a question about which option is considered the best (perhaps more
of a poll),
> between two choices
> 
> 1. unbundle them again? (it does save 1/3 of the space, and I do get
complaints from R CMD
> build about size)
> 2. send notes to the 4 maintainers.? The help files for the data sets have
the usage
> documented as? "lung" or "breast", and not data(lung),
so I am technically legal to claim
> they have a mistake.
> 
> A third option to make the data sets a separate package is not on the
table.? I use them
> heavily in my help files and test suite, and since survival is a
recommended package I
> can't add library(x) statements for? !(x %in% recommended).?? I am
guessing that this
> would also break many dependent packages.
> 
> Terry T.
> 
-- 
Michael
http://www.dewey.myzen.co.uk/home.html

David Scott

2021-Feb-17 09:28 UTC

head link

[Rd] issue with data()

I would recommend option 2. I have done that when changes to xtable broke some
packages. xtable has a number of dependencies but not on the scale of survival.
Just 4 packages out of 868 seems minimal to me.

David Scott

On 17/02/2021 3:39 am, Therneau, Terry M., Ph.D. via R-devel wrote:
I am testing out the next release of survival, which involves running R CMD
check on 868
CRAN packages that import, depend or suggest it.

The survival package has a lot of data sets, most of which are non-trivial real
examples
(something I'm proud of).  To save space I've bundled many of them,
.e.g., data/cancer.rda
has 19 different dataframes.

This caused failures in 4 packages, each because they have a line such as
"data(lung)"  or
data(breast, package= "survival"); and the data() command looks for a
file name.

This is a question about which option is considered the best (perhaps more of a
poll),
between two choices

1. unbundle them again  (it does save 1/3 of the space, and I do get complaints
from R CMD
build about size)
2. send notes to the 4 maintainers.  The help files for the data sets have the
usage
documented as  "lung" or "breast", and not data(lung), so I
am technically legal to claim
they have a mistake.

A third option to make the data sets a separate package is not on the table.  I
use them
heavily in my help files and test suite, and since survival is a recommended
package I
can't add library(x) statements for  !(x %in% recommended).   I am guessing
that this
would also break many dependent packages.

Terry T.

--
Terry M Therneau, PhD
Department of Health Science Research
Mayo Clinic
therneau at mayo.edu<mailto:therneau at mayo.edu>

"TERR-ree THUR-noh"


[[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel<https://stat.ethz.ch/mailman/listinfo/r-devel>


--
_________________________________________________________________
David Scott
Department of Statistics
The University of Auckland, PB 92019
Auckland 1142,    NEW ZEALAND
Email:    d.scott at auckland.ac.nz<mailto:d.scott at auckland.ac.nz>

	[[alternative HTML version deleted]]

R devel - Feb 2021 - issue with data()

[Rd] issue with data()

[Rd] issue with data()

[Rd] issue with data()