This is my first post to this list so I suppose a quick intro is in order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, and love S already. I'm reading MASS and also John Fox's book - both have been very useful. My background in stat software was mainly SPSS (which I've never much liked - thanks heavens I've found S!), and Perl is my tool of choice for general-purpose programming (I chaired the perl6-language-data working group, responsible for improving the data analysis capabilities in Perl). I have just completed my first S project, and I now have 8 lm.objects. The models are all reasonably complex with multiple numeric and factor variables and some 2-way and 3-way interactions. I now need to use these models in other environments, such as C code, SQL functions (using CASE) and in Perl - I can not work out how to do this. The difficulty I am having is that the output of coef() is not really parsable, since there is no marker in the name of an coefficient of separate out the components. For instance, in SPSS the name of a coefficient might be: var1=[a]*var2=[b]*var3 ...which is easy to write a little script to pull that apart and turn it into a line of SQL, C, or whatever. In S however the name looks like: var1avar2bvar3 ...which provides no way to pull the bits apart. So my question is, how do I export an lm.object in some form that I can then apply to prediction in C, SQL, or some other language? All I'm looking for is some well-structured textual or data frame output that I can then manipulate with appropriate tools, whether it be S itself, or something like Perl. Thanks in advance for any suggestions (and apologies in advance if this is well documented somewhere!), Jeremy
j+rhelp at howard.fm wrote:> This is my first post to this list so I suppose a quick intro is in > order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, > and love S already. I'm reading MASS and also John Fox's book - both have > been very useful. My background in stat software was mainly SPSS (which > I've never much liked - thanks heavens I've found S!), and Perl is my > tool of choice for general-purpose programming (I chaired the > perl6-language-data working group, responsible for improving the data > analysis capabilities in Perl). > > I have just completed my first S project, and I now have 8 lm.objects. > The models are all reasonably complex with multiple numeric and factor > variables and some 2-way and 3-way interactions. I now need to use these > models in other environments, such as C code, SQL functions (using CASE) > and in Perl - I can not work out how to do this. > > The difficulty I am having is that the output of coef() is not really > parsable, since there is no marker in the name of an coefficient of > separate out the components. For instance, in SPSS the name of a > coefficient might be: > > var1=[a]*var2=[b]*var3 > > ...which is easy to write a little script to pull that apart and turn it > into a line of SQL, C, or whatever. In S however the name looks like: > > var1avar2bvar3>> ...which provides no way to pull the bits apart. > > So my question is, how do I export an lm.object in some form that I can > then apply to prediction in C, SQL, or some other language? All I'm > looking for is some well-structured textual or data frame output that I > can then manipulate with appropriate tools, whether it be S itself, or > something like Perl. > > Thanks in advance for any suggestions (and apologies in advance if this > is well documented somewhere!), > > Jeremy >See ?dump Uwe Ligges
ripley@stats.ox.ac.uk
2003-Feb-14 09:07 UTC
[R] Translating lm.object to SQL, C, etc function
The issue here is that coef() tells you the coefficients in R's internal parametrization of the model, and that is of no use to you unless you have a means of creating a model matrix in C, SQL or (heaven forbid) Perl. The information needed to re-create a model matrix is stored in the lm fit, but in ways that are going to be hard to use anywhere else (since they include R functions). This is not perverse: what R does is very general, *far* more so than SPSS. Formulae in lm can include poly() and ns() terms, for example. The only practical solution it seems to us is to ask R to create the model matrix for new data. Then the things you are talking about are just the colnames of that matrix, and don't need to be interpreted. You may want to read the sources to find out how R does it: that area is one of the most complex parts of the internals, and one in which bugs continue to emerge. On Fri, 14 Feb 2003 j+rhelp at howard.fm wrote:> This is my first post to this list so I suppose a quick intro is in > order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, > and love S already. I'm reading MASS and also John Fox's book - both have > been very useful. My background in stat software was mainly SPSS (which > I've never much liked - thanks heavens I've found S!), and Perl is my > tool of choice for general-purpose programming (I chaired the > perl6-language-data working group, responsible for improving the data > analysis capabilities in Perl). > > I have just completed my first S project, and I now have 8 lm.objects. > The models are all reasonably complex with multiple numeric and factor > variables and some 2-way and 3-way interactions. I now need to use these > models in other environments, such as C code, SQL functions (using CASE) > and in Perl - I can not work out how to do this. > > The difficulty I am having is that the output of coef() is not really > parsable, since there is no marker in the name of an coefficient of > separate out the components. For instance, in SPSS the name of a > coefficient might be: > > var1=[a]*var2=[b]*var3 > > ...which is easy to write a little script to pull that apart and turn it > into a line of SQL, C, or whatever. In S however the name looks like: > > var1avar2bvar3 > > ...which provides no way to pull the bits apart.I find that impossible to understand anyway, but doubt that it corresponds to SPSS. For a variable V, label Va does not mean V=[a] except in unusual special cases.> So my question is, how do I export an lm.object in some form that I can > then apply to prediction in C, SQL, or some other language? All I'm > looking for is some well-structured textual or data frame output that I > can then manipulate with appropriate tools, whether it be S itself, or > something like Perl. > > Thanks in advance for any suggestions (and apologies in advance if this > is well documented somewhere!), > > Jeremy > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Frank E Harrell Jr
2003-Feb-14 12:53 UTC
[R] Translating lm.object to SQL, C, etc function
On Fri, 14 Feb 2003 16:37:42 +1100 j+rhelp at howard.fm wrote:> This is my first post to this list so I suppose a quick intro is in > order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, > and love S already. I'm reading MASS and also John Fox's book - both have > been very useful. My background in stat software was mainly SPSS (which > I've never much liked - thanks heavens I've found S!), and Perl is my > tool of choice for general-purpose programming (I chaired the > perl6-language-data working group, responsible for improving the data > analysis capabilities in Perl). > > I have just completed my first S project, and I now have 8 lm.objects. > The models are all reasonably complex with multiple numeric and factor > variables and some 2-way and 3-way interactions. I now need to use these > models in other environments, such as C code, SQL functions (using CASE) > and in Perl - I can not work out how to do this. > > The difficulty I am having is that the output of coef() is not really > parsable, since there is no marker in the name of an coefficient of > separate out the components. For instance, in SPSS the name of a > coefficient might be: > > var1=[a]*var2=[b]*var3 > > ...which is easy to write a little script to pull that apart and turn it > into a line of SQL, C, or whatever. In S however the name looks like: > > var1avar2bvar3 > > ...which provides no way to pull the bits apart. > > So my question is, how do I export an lm.object in some form that I can > then apply to prediction in C, SQL, or some other language? All I'm > looking for is some well-structured textual or data frame output that I > can then manipulate with appropriate tools, whether it be S itself, or > something like Perl. > > Thanks in advance for any suggestions (and apologies in advance if this > is well documented somewhere!), > > Jeremy >Some functions that may give you some ideas, from the Design library (http://hesweb1.med.virginia.edu/biostat/s/Design.html).: Function(fit): generate S function to obtain predicted values from a regression fit that was done with Design in effect (i.e., fit with ols, cph, lrm, psm, glmD) latex(fit): generate LaTeX code for typesetting the model fit sascode(Function(fit)): translate formula to SAS notation What I think would be very useful would be a function like Function that instead symbolically creates the design matrix, and translating that function to SQL etc. This would allow computation of confidence limits. -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
Dear Jeremy, I've written replacements for the standard R contrast functions that produce the kind of more easily parsed (and more readable) contrast names that I think you have in mind. I intend to include these in the next release of the car package for R but haven't done so yet. Since the code isn't very long, I've appended it (and the .Rd documentation file to this note). Note that R does separate terms in an interaction with a colon. I hope that this does what you need. John ---------------------------- Contrasts.R ----------------------------- # last modified 2 Dec 2002 by J. Fox # all of these functions are adapted from functions in the R base package contr.Treatment <- function (n, base = 1, contrasts = TRUE) { if (is.numeric(n) && length(n) == 1) levs <- 1:n else { levs <- n n <- length(n) } lev.opt <- getOption("decorate.contrasts") pre <- if (is.null(lev.opt)) "[" else lev.opt[1] suf <- if (is.null(lev.opt)) "]" else lev.opt[2] dec <- getOption("decorate.contr.Treatment") dec <- if (!contrasts) "" else if (is.null(dec)) "T." else dec contr.names <- paste(pre, dec, levs, suf, sep="") contr <- array(0, c(n, n), list(levs, contr.names)) diag(contr) <- 1 if (contrasts) { if (n < 2) stop(paste("Contrasts not defined for", n - 1, "degrees of freedom")) if (base < 1 | base > n) stop("Baseline group number out of range") contr <- contr[, -base, drop = FALSE] } contr } contr.Sum <- function (n, contrasts = TRUE) { if (length(n) <= 1) { if (is.numeric(n) && length(n) == 1 && n > 1) levels <- 1:n else stop("Not enough degrees of freedom to define contrasts") } else levels <- n lenglev <- length(levels) lev.opt <- getOption("decorate.contrasts") pre <- if (is.null(lev.opt)) "[" else lev.opt[1] suf <- if (is.null(lev.opt)) "]" else lev.opt[2] dec <- getOption("decorate.contr.Sum") dec <- if (!contrasts) "" else if (is.null(dec)) "S." else dec show.lev <- getOption("contr.Sum.show.levels") contr.names <- if ((is.null(show.lev)) || show.lev) paste(pre, dec, levels, suf, sep="") if (contrasts) { cont <- array(0, c(lenglev, lenglev - 1), list(levels, contr.names[-lenglev])) cont[col(cont) == row(cont)] <- 1 cont[lenglev, ] <- -1 } else { cont <- array(0, c(lenglev, lenglev), list(levels, contr.names)) cont[col(cont) == row(cont)] <- 1 } cont } contr.Helmert <- function (n, contrasts = TRUE) { if (length(n) <= 1) { if (is.numeric(n) && length(n) == 1 && n > 1) levels <- 1:n else stop("contrasts are not defined for 0 degrees of freedom") } else levels <- n lenglev <- length(levels) lev.opt <- getOption("decorate.contrasts") pre <- if (is.null(lev.opt)) "[" else lev.opt[1] suf <- if (is.null(lev.opt)) "]" else lev.opt[2] dec <- getOption("decorate.contr.Helmert") dec <- if (!contrasts) "" else if (is.null(dec)) "H." else dec nms <- if (contrasts) 1:lenglev else levels contr.names <- paste(pre, dec, nms, suf, sep="") if (contrasts) { cont <- array(-1, c(lenglev, lenglev - 1), list(levels, contr.names[-lenglev])) cont[col(cont) <= row(cont) - 2] <- 0 cont[col(cont) == row(cont) - 1] <- 1:(lenglev - 1) } else { cont <- array(0, c(lenglev, lenglev), list(levels, contr.names)) cont[col(cont) == row(cont)] <- 1 } cont } ------------------------------- Contrasts.Rd ------------------------------------------ \name{Contrasts} \alias{Contrasts} \alias{contr.Treatment} \alias{contr.Sum} \alias{contr.Helmert} \title{Functions to Construct Contrasts} \description{ These are substitutes for similarly named functions in the base package (note the uppercase letter starting the second word in each function name). The only difference is that the contrast functions from the car package produce easier-to-read names for the contrasts when they are used in statistical models. The functions and this documentation are adapted from the base package. } \usage{ contr.Treatment(n, base = 1, contrasts = TRUE) contr.Sum(n, contrasts = TRUE) contr.Helmert(n, contrasts = TRUE) } \arguments{ \item{n}{a vector of levels for a factor, or the number of levels.} \item{base}{an integer specifying which level is considered the baseline level. Ignored if \code{contrasts} is \code{FALSE}.} \item{contrasts}{a logical indicating whether contrasts should be computed.} } \details{ These functions are used for creating contrast matrices for use in fitting analysis of variance and regression models. The columns of the resulting matrices contain contrasts which can be used for coding a factor with \code{n} levels. The returned value contains the computed contrasts. If the argument \code{contrasts} is \code{FALSE} then a square matrix is returned. Several aspects of these contrast functions are controlled by options set via the \code{options} command: \describe{ \item{\code{decorate.contrasts}}{This option should be set to a 2-element character vector containing the prefix and suffix characters to surround contrast names. If the option is not set, then \code{c("[", "]")} is used. For example, setting \code{options(decorate.contrasts=c(".", ""))} produces contrast names that are separated from factor names by a period. Setting \code{options(decorate.contrasts=c("", ""))} reproduces the behaviour of the R base contrast functions.} \item{\code{decorate.contr.Treatment}}{A character string to be appended to contrast names to signify treatment contrasts; if the option is unset, then \code{"T."} is used.} \item{\code{decorate.contr.Sum}}{Similar to the above, with default \code{"S."}.} \item{\code{decorate.contr.Helmert}}{Similar to the above, with default \code{"H."}.} \item{\code{contr.Sum.show.levels}}{Logical value: if \code{TRUE} (the default if unset), then level names are used for contrasts; if \code{FALSE}, then numbers are used, as in \code{contr.sum} in the \code{base} package.} } Note that there is no replacement for \code{contr.poly} in the \code{base} package (which produces orthogonal-polynomial contrasts) since this function already constructs easy-to-read contrast names. } \value{ A matrix with \code{n} rows and \code{k} columns, with \code{k = n - 1} if \code{contrasts} is \code{TRUE} and \code{k = n} if \code{contrasts} is \code{FALSE}. } \author{John Fox \email{jfox at mcmaster.ca}} \seealso{\code{\link[base]{contr.treatment}}, \code{\link[base]{contr.sum}}, \code{\link[base]{contr.helmert}}, \code{\link[base]{contr.poly}} } \examples{ # contr.Treatment vs. contr.treatment in the base package: data(Prestige) lm(prestige ~ (income + education)*type, data=Prestige, contrasts=list(type="contr.Treatment")) ## Call: ## lm(formula = prestige ~ (income + education) * type, data = Prestige, ## contrasts = list(type = "contr.Treatment")) ## ## Coefficients: ## (Intercept) income education ## 2.275753 0.003522 1.713275 ## type[T.prof] type[T.wc] income:type[T.prof] ## 15.351896 -33.536652 -0.002903 ## income:type[T.wc] education:type[T.prof] education:type[T.wc] ## -0.002072 1.387809 4.290875 lm(prestige ~ (income + education)*type, data=Prestige, contrasts=list(type="contr.treatment")) ## Call: ## lm(formula = prestige ~ (income + education) * type, data = Prestige, ## contrasts = list(type = "contr.treatment")) ## ## Coefficients: ## (Intercept) income education ## 2.275753 0.003522 1.713275 ## typeprof typewc income:typeprof ## 15.351896 -33.536652 -0.002903 ## income:typewc education:typeprof education:typewc ## -0.002072 1.387809 4.290875 } \keyword{models} \keyword{regression} ------------------------------------------------------------------------------------------------------------------------------------------------- At 04:37 PM 2/14/2003 +1100, j+rhelp at howard.fm wrote:>This is my first post to this list so I suppose a quick intro is in >order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, >and love S already. I'm reading MASS and also John Fox's book - both have >been very useful. My background in stat software was mainly SPSS (which >I've never much liked - thanks heavens I've found S!), and Perl is my >tool of choice for general-purpose programming (I chaired the >perl6-language-data working group, responsible for improving the data >analysis capabilities in Perl). > >I have just completed my first S project, and I now have 8 lm.objects. >The models are all reasonably complex with multiple numeric and factor >variables and some 2-way and 3-way interactions. I now need to use these >models in other environments, such as C code, SQL functions (using CASE) >and in Perl - I can not work out how to do this. > >The difficulty I am having is that the output of coef() is not really >parsable, since there is no marker in the name of an coefficient of >separate out the components. For instance, in SPSS the name of a >coefficient might be: > > var1=[a]*var2=[b]*var3 > >...which is easy to write a little script to pull that apart and turn it >into a line of SQL, C, or whatever. In S however the name looks like: > > var1avar2bvar3 > >...which provides no way to pull the bits apart. > >So my question is, how do I export an lm.object in some form that I can >then apply to prediction in C, SQL, or some other language? All I'm >looking for is some well-structured textual or data frame output that I >can then manipulate with appropriate tools, whether it be S itself, or >something like Perl. > >Thanks in advance for any suggestions (and apologies in advance if this >is well documented somewhere!),
On Fri, 14 Feb 2003 08:25:05 +0000 (GMT), ripley at stats.ox.ac.uk said:> On Fri, 14 Feb 2003 j+rhelp at howard.fm wrote: > > Thanks for the suggestion. After my last post I tried switching from > > SPLUS to R and discovered the useful xlevels attribute, which when output > > Eh? S-PLUS has an "xlevels" attribute, but R has an "xlevels" component > (speaks the author of both). >Thanks for the clarification.> You add methods to functions, not classes, in R. You could indeed add > generic accessor functions with lm methods, but their absence (and the > lack of documentation of the internal structure) should alert you to the > idea that this is internal structure and not part of a public API. >OK, I'll stay away from that then...> > In SPLUS I came across a useful attribute 'assign', which has a mapping > > of term names to variables - the same attribute in R doesn't appear to > > provide this information. Is this available somewhere? > > Both have an assign *component*, not attribute. R has a mapping from > model.matrix columns to terms in its assign component: that's not an > accurate description of S-PLUS's component .... >Now that I look again more closely I see what you mean. Sorry for the missing that. -- Jeremy Howard jhoward at fastmail.fm