John Maindonald
2008-Dec-05 00:33 UTC
[R] R] adding a new dataset to the default R distribution
Making data, especially data that have been the subject of published papers, widely available, can be a useful spinoff from the R project, another gift to the scientific community beyond the provision of computing and analytic tools. Nowadays, in a complete publication of a scientific result, there is every reason for the data to be part of that publication. The Gentleman and Lang 2004 paper "Statistical Analyses and Reproducible Research" takes this further still, making a compelling case for opening the analysis to ready view. (http://www.bepress.com/bioconductor/paper2/ ) How else can critics know what analysis was done, and whether the data do really support the claimed conclusions? As I see it, the first recourse should be use of archives that individual communities may establish. Instructions on how to input the data into R would be a useful small item of ancillary information. Links to such archives (under Data Archives, maybe) might be included on CRAN. The Open Archaeology project would seem a good umbrella for the archiving of archaeology data. Where there is no available repository, or there are reasons for putting the data into an R package, one possibility is to advertise on this list: "Orphan dataset, looking for a good home". In this case I have offered to include the data in the DAAGxtras package, and I am open to further such requests. Perhaps however, there should be a "miscdata" or suchlike package to which such datasets can be submitted? All it would require is for someone to offer to act as Keeper of the Miscellaneous Data". John Maindonald email: john.maindonald@anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. On 04/12/2008, at 10:00 PM, r-help-request@r-project.org wrote:> From: Stefano Costa <steko@iosa.it> > Date: 3 December 2008 9:29:13 PM > To: r-help@r-project.org > Subject: [R] adding a new dataset to the default R distribution > > > Hi, > I am a student in archaeology with some interest in statistics and R. > > Recently I've obtained the permission to distribute in the public > domain > a small dataset (named "spearheads" with 40 obs. of 14 variables) > that > was used in an introductory statistics book for archaeologists > (published in 1994). > > I've rewritten most of the exercises of that book in R and made them > available at http://wiki.iosa.it/diggingnumbers:start along with the > original dataset, but I was wondering if there is a standard procedure > for getting a new dataset included in the default R distribution, like > "cars" and others. > > Please add me to CC if replying because I'm not subscribed to the > list. > > Best regards, > Stefano > > -- > Stefano Costa > http://www.iosa.it/ Open Archaeology[[alternative HTML version deleted]]