On Sun, 9 Feb 2003 16:29:51 EST
TyagiAnupam at aol.com wrote:
> Hi R users,
>
> I am new to using DBMS with R for large datasets. Thanks to all who
responded
> with useful suggestion to my earlier postings about using large datasets
and
> DBMS with R. I am writing to get some help about how to design good tables
in
> DBMS to take full advantage of the wonderful built-in facilities in R, like
> labels.
>
> I am using RMySQL client. Because R makes good use of variable and value
> labels and data (column) types, I would like to create tables with
> appropriate design in terms of,
> (1) datatype (char, varchar, int, etc.) in DBMS such that it corresponds
> with the appropriate datatype in R (factor, numeric, etc.) when converted,
> (2) How best to store variable and values lables and formats in DBMS, so
they
> are correctly included in the data.frame that DBMS clients like RMySQL
create
> for use in R.
> If I had only a few variables and values this will not be a problem; I can
> use meaningful variable names or create labels directly in R. But with 1600
> variables, many with about 10 catagorical values, this approach does not
look
> promising. Is there a document somewhere that addresses this issue? What
> would be a good way to solve this problem?
> Anupam.
> *********************************************************
> Prediction is very difficult, especially about the future.
> -- Niels Bohr
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
We are working on a PostgrSQL-based system in which all metadata are defined in
XML. Ultimately I will interpret the XML metadata in R to fetch the variable
labels. In the Hmisc library I have a label function to make it easy to assign
a 'label' attribute to an individual variable, and a function upData
which makes it easy to assign lots of labels. I will use the same
'label' attribute these use when fetching labels from XML.
Another possibility is to make a table defining variable-specific metadata.
Then you could just read in the table and write a short function to pull out
labels after matching on variable names, assigning the labels to an attribute of
your choosing.
--
Frank E Harrell Jr Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat