Where I work a lot of people end up using Excel spreadsheets for storing data. This has limitations and maybe some less than obvious problems. I'd like to recommend a uniform way for storing and archiving data collected in the department. Most of the data could be stored in simple csv type files but it would be nice to have something that stores more information about the variables and units. netcdf seems like overkill (and not easy for casual users). Same for postgres and mysql databases. Could someone recommend some system for storing relatively small data sets (50-100 variables, <1000 records) that would be reliable, safe, and easy for people to view and edit their data that works nicely with R and is open source? Am I asking for the moon? Rick B.
rab45+ at pitt.edu wrote:>Where I work a lot of people end up using Excel spreadsheets for storing >data. This has limitations and maybe some less than obvious problems. I'd >like to recommend a uniform way for storing and archiving data collected >in the department. Most of the data could be stored in simple csv type >files but it would be nice to have something that stores more information >about the variables and units. netcdf seems like overkill (and not easy >for casual users). Same for postgres and mysql databases. Could someone >recommend some system for storing relatively small data sets (50-100 >variables, <1000 records) that would be reliable, safe, and easy for >people to view and edit their data that works nicely with R and is open >source? Am I asking for the moon? > >Would the StatDataML format meet your needs ? It is open, XML-based, stores variable types and works nicely with R (as R wizards designed StatDataML and the corresponding R package). See http://cran.r-project.org/src/contrib/Descriptions/StatDataML.html or http://www.omegahat.org/StatDataML/ HTH, Tobias
rab45+ at pitt.edu wrote:> Where I work a lot of people end up using Excel spreadsheets for storing > data. This has limitations and maybe some less than obvious problems. I'd > like to recommend a uniform way for storing and archiving data collected > in the department. Most of the data could be stored in simple csv type > files but it would be nice to have something that stores more information > about the variables and units. netcdf seems like overkill (and not easy > for casual users). Same for postgres and mysql databases. Could someone > recommend some system for storing relatively small data sets (50-100 > variables, <1000 records) that would be reliable, safe, and easy for > people to view and edit their data that works nicely with R and is open > source? Am I asking for the moon? > > Rick B.What I use is the facilities in the Hmisc package, which handles variable labels and units of measurement and has functions for importing data (saving labels in the appropriate place) and making use of the attributes (e.g., combining labels and units with a smaller font for the units portion in an axis label). When such an annotated data frame is saved using save(...., compress=TRUE), load()'ing it back will provide an annotated data frame, quickly. The contents( ) function can show the attributes, and we use html(contents( )) to put up a web page with hyperlinks for value labels (factor variable levels attribute). -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
You might want to look at sqlite (http://www.sqlite.org/). There is already an R package for accessing these databases and the website shows some GUI interfaces that may be easy enough for tha casual user. Best of all it is small, quick, free, and open source. Greg Snow, Ph.D. Statistical Data Center, LDS Hospital Intermountain Health Care greg.snow at ihc.com (801) 408-8111>>> <rab45+ at pitt.edu> 09/30/05 10:26PM >>>Where I work a lot of people end up using Excel spreadsheets for storing data. This has limitations and maybe some less than obvious problems. I'd like to recommend a uniform way for storing and archiving data collected in the department. Most of the data could be stored in simple csv type files but it would be nice to have something that stores more information about the variables and units. netcdf seems like overkill (and not easy for casual users). Same for postgres and mysql databases. Could someone recommend some system for storing relatively small data sets (50-100 variables, <1000 records) that would be reliable, safe, and easy for people to view and edit their data that works nicely with R and is open source? Am I asking for the moon? Rick B. ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html