I was glad to see the new rpart.plot package by Stephen Milborrow. I was however a bit concerned that Stephen distributed a dataset I created, and renamed the dataset (from titanic3 to ptitanic) in the process [with some justification, as some variables were omitted]. Fortunately Stephen included the script he used to download the dataset from our web site, and gave full credit to us. What concerns me is that the rpart.plot package does not contain many functions but the package is as large as packages containing hundreds of functions. This is due to the inclusion of the dataset. I would prefer that authors provide the URL so that users can easily install the binary R binary dataframe directly from our web site (we even have an automated way to do this: require(Hmisc); getHdata(titanic3)). This will allow users to profit from possible future data corrections as well as making the package much more compact. Thanks for listening. I'm writing to r-help because this may applied to other R packages as well. Frank ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Issue-with-dataset-inclusion-in-CRAN-packages-tp3626536p3626536.html Sent from the R help mailing list archive at Nabble.com.
I was wrong about this. The dataset is small. Most of the space is taken up by a nice tutorial on rpart.plot. Still I would favor linking to datasets rather than duplicating part of them. Thanks Frank Frank Harrell wrote:> > I was glad to see the new rpart.plot package by Stephen Milborrow. I was > however a bit concerned that Stephen distributed a dataset I created, and > renamed the dataset (from titanic3 to ptitanic) in the process [with some > justification, as some variables were omitted]. Fortunately Stephen > included the script he used to download the dataset from our web site, and > gave full credit to us. What concerns me is that the rpart.plot package > does not contain many functions but the package is as large as packages > containing hundreds of functions. This is due to the inclusion of the > dataset. I would prefer that authors provide the URL so that users can > easily install the binary R binary dataframe directly from our web site > (we even have an automated way to do this: require(Hmisc); > getHdata(titanic3)). This will allow users to profit from possible future > data corrections as well as making the package much more compact. Thanks > for listening. I'm writing to r-help because this may applied to other R > packages as well. > > Frank >----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Issue-with-dataset-inclusion-in-CRAN-packages-tp3626536p3626568.html Sent from the R help mailing list archive at Nabble.com.
Em 26/6/2011 17:43, Frank Harrell escreveu:> I was glad to see the new rpart.plot package by Stephen Milborrow. I was > however a bit concerned that Stephen distributed a dataset I created, and > renamed the dataset (from titanic3 to ptitanic) in the process [with some > justification, as some variables were omitted]. Fortunately Stephen > included the script he used to download the dataset from our web site, and > gave full credit to us. What concerns me is that the rpart.plot package > does not contain many functions but the package is as large as packages > containing hundreds of functions. This is due to the inclusion of the > dataset. I would prefer that authors provide the URL so that users can > easily install the binary R binary dataframe directly from our web site (we > even have an automated way to do this: require(Hmisc); getHdata(titanic3)). > This will allow users to profit from possible future data corrections as > well as making the package much more compact. Thanks for listening. I'm > writing to r-help because this may applied to other R packages as well. >Frank, I can understand your concern and at first thought would even second it. On the other hand, I think there are reasonable explanations why all authors prefer to include the datasets, especially if the data will be used in examples: 1) Docs written based in the datasets are synced with the dataframes offered with the package; 2) In several environments access to the web may be restricted and the getHdata or read.table("<url>") be not allowed. my 0.019999... Regards, -- Cesar Rabak