This is my first post to this list so I suppose a quick intro is in order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, and love S already. I'm reading MASS and also John Fox's book - both have been very useful. My background in stat software was mainly SPSS (which I've never much liked - thanks heavens I've found S!), and Perl is my tool of choice for general-purpose programming (I chaired the perl6-language-data working group, responsible for improving the data analysis capabilities in Perl). I have just completed my first S project, and I now have 8 lm.objects. The models are all reasonably complex with multiple numeric and factor variables and some 2-way and 3-way interactions. I now need to use these models in other environments, such as C code, SQL functions (using CASE) and in Perl - I can not work out how to do this. The difficulty I am having is that the output of coef() is not really parsable, since there is no marker in the name of an coefficient of separate out the components. For instance, in SPSS the name of a coefficient might be: var1=[a]*var2=[b]*var3 ...which is easy to write a little script to pull that apart and turn it into a line of SQL, C, or whatever. In S however the name looks like: var1avar2bvar3 ...which provides no way to pull the bits apart. So my question is, how do I export an lm.object in some form that I can then apply to prediction in C, SQL, or some other language? All I'm looking for is some well-structured textual or data frame output that I can then manipulate with appropriate tools, whether it be S itself, or something like Perl. Thanks in advance for any suggestions (and apologies in advance if this is well documented somewhere!), Jeremy
j+rhelp at howard.fm wrote:> This is my first post to this list so I suppose a quick intro is in > order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, > and love S already. I'm reading MASS and also John Fox's book - both have > been very useful. My background in stat software was mainly SPSS (which > I've never much liked - thanks heavens I've found S!), and Perl is my > tool of choice for general-purpose programming (I chaired the > perl6-language-data working group, responsible for improving the data > analysis capabilities in Perl). > > I have just completed my first S project, and I now have 8 lm.objects. > The models are all reasonably complex with multiple numeric and factor > variables and some 2-way and 3-way interactions. I now need to use these > models in other environments, such as C code, SQL functions (using CASE) > and in Perl - I can not work out how to do this. > > The difficulty I am having is that the output of coef() is not really > parsable, since there is no marker in the name of an coefficient of > separate out the components. For instance, in SPSS the name of a > coefficient might be: > > var1=[a]*var2=[b]*var3 > > ...which is easy to write a little script to pull that apart and turn it > into a line of SQL, C, or whatever. In S however the name looks like: > > var1avar2bvar3>> ...which provides no way to pull the bits apart. > > So my question is, how do I export an lm.object in some form that I can > then apply to prediction in C, SQL, or some other language? All I'm > looking for is some well-structured textual or data frame output that I > can then manipulate with appropriate tools, whether it be S itself, or > something like Perl. > > Thanks in advance for any suggestions (and apologies in advance if this > is well documented somewhere!), > > Jeremy >See ?dump Uwe Ligges
ripley@stats.ox.ac.uk
2003-Feb-14  09:07 UTC
[R] Translating lm.object to SQL, C, etc function
The issue here is that coef() tells you the coefficients in R's internal parametrization of the model, and that is of no use to you unless you have a means of creating a model matrix in C, SQL or (heaven forbid) Perl. The information needed to re-create a model matrix is stored in the lm fit, but in ways that are going to be hard to use anywhere else (since they include R functions). This is not perverse: what R does is very general, *far* more so than SPSS. Formulae in lm can include poly() and ns() terms, for example. The only practical solution it seems to us is to ask R to create the model matrix for new data. Then the things you are talking about are just the colnames of that matrix, and don't need to be interpreted. You may want to read the sources to find out how R does it: that area is one of the most complex parts of the internals, and one in which bugs continue to emerge. On Fri, 14 Feb 2003 j+rhelp at howard.fm wrote:> This is my first post to this list so I suppose a quick intro is in > order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, > and love S already. I'm reading MASS and also John Fox's book - both have > been very useful. My background in stat software was mainly SPSS (which > I've never much liked - thanks heavens I've found S!), and Perl is my > tool of choice for general-purpose programming (I chaired the > perl6-language-data working group, responsible for improving the data > analysis capabilities in Perl). > > I have just completed my first S project, and I now have 8 lm.objects. > The models are all reasonably complex with multiple numeric and factor > variables and some 2-way and 3-way interactions. I now need to use these > models in other environments, such as C code, SQL functions (using CASE) > and in Perl - I can not work out how to do this. > > The difficulty I am having is that the output of coef() is not really > parsable, since there is no marker in the name of an coefficient of > separate out the components. For instance, in SPSS the name of a > coefficient might be: > > var1=[a]*var2=[b]*var3 > > ...which is easy to write a little script to pull that apart and turn it > into a line of SQL, C, or whatever. In S however the name looks like: > > var1avar2bvar3 > > ...which provides no way to pull the bits apart.I find that impossible to understand anyway, but doubt that it corresponds to SPSS. For a variable V, label Va does not mean V=[a] except in unusual special cases.> So my question is, how do I export an lm.object in some form that I can > then apply to prediction in C, SQL, or some other language? All I'm > looking for is some well-structured textual or data frame output that I > can then manipulate with appropriate tools, whether it be S itself, or > something like Perl. > > Thanks in advance for any suggestions (and apologies in advance if this > is well documented somewhere!), > > Jeremy > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > http://www.stat.math.ethz.ch/mailman/listinfo/r-help >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Frank E Harrell Jr
2003-Feb-14  12:53 UTC
[R] Translating lm.object to SQL, C, etc function
On Fri, 14 Feb 2003 16:37:42 +1100 j+rhelp at howard.fm wrote:> This is my first post to this list so I suppose a quick intro is in > order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days, > and love S already. I'm reading MASS and also John Fox's book - both have > been very useful. My background in stat software was mainly SPSS (which > I've never much liked - thanks heavens I've found S!), and Perl is my > tool of choice for general-purpose programming (I chaired the > perl6-language-data working group, responsible for improving the data > analysis capabilities in Perl). > > I have just completed my first S project, and I now have 8 lm.objects. > The models are all reasonably complex with multiple numeric and factor > variables and some 2-way and 3-way interactions. I now need to use these > models in other environments, such as C code, SQL functions (using CASE) > and in Perl - I can not work out how to do this. > > The difficulty I am having is that the output of coef() is not really > parsable, since there is no marker in the name of an coefficient of > separate out the components. For instance, in SPSS the name of a > coefficient might be: > > var1=[a]*var2=[b]*var3 > > ...which is easy to write a little script to pull that apart and turn it > into a line of SQL, C, or whatever. In S however the name looks like: > > var1avar2bvar3 > > ...which provides no way to pull the bits apart. > > So my question is, how do I export an lm.object in some form that I can > then apply to prediction in C, SQL, or some other language? All I'm > looking for is some well-structured textual or data frame output that I > can then manipulate with appropriate tools, whether it be S itself, or > something like Perl. > > Thanks in advance for any suggestions (and apologies in advance if this > is well documented somewhere!), > > Jeremy >Some functions that may give you some ideas, from the Design library (http://hesweb1.med.virginia.edu/biostat/s/Design.html).: Function(fit): generate S function to obtain predicted values from a regression fit that was done with Design in effect (i.e., fit with ols, cph, lrm, psm, glmD) latex(fit): generate LaTeX code for typesetting the model fit sascode(Function(fit)): translate formula to SAS notation What I think would be very useful would be a function like Function that instead symbolically creates the design matrix, and translating that function to SQL etc. This would allow computation of confidence limits. -- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
Dear Jeremy,
I've written replacements for the standard R contrast functions that 
produce the kind of more easily parsed (and more readable) contrast names 
that I think you have in mind. I intend to include these in the next 
release of the car package for R but haven't done so yet. Since the code 
isn't very long, I've appended it (and the .Rd documentation file to
this
note). Note that R does separate terms in an interaction with a colon.
I hope that this does what you need.
  John
---------------------------- Contrasts.R -----------------------------
# last modified 2 Dec 2002 by J. Fox
# all of these functions are adapted from functions in the R base package
contr.Treatment <- function (n, base = 1, contrasts = TRUE) {
     if (is.numeric(n) && length(n) == 1)
         levs <- 1:n
     else {
         levs <- n
         n <- length(n)
     }
     lev.opt <- getOption("decorate.contrasts")
     pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
     suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
     dec <- getOption("decorate.contr.Treatment")
     dec <- if (!contrasts) ""
            else if (is.null(dec)) "T."
            else dec
     contr.names <- paste(pre, dec, levs, suf, sep="")
     contr <- array(0, c(n, n), list(levs, contr.names))
     diag(contr) <- 1
     if (contrasts) {
         if (n < 2)
             stop(paste("Contrasts not defined for", n - 1,
"degrees of
freedom"))
         if (base < 1 | base > n)
             stop("Baseline group number out of range")
         contr <- contr[, -base, drop = FALSE]
     }
     contr
}
contr.Sum <- function (n, contrasts = TRUE)
{
     if (length(n) <= 1) {
         if (is.numeric(n) && length(n) == 1 && n > 1)
             levels <- 1:n
         else stop("Not enough degrees of freedom to define
contrasts")
     }
     else levels <- n
     lenglev <- length(levels)
     lev.opt <- getOption("decorate.contrasts")
     pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
     suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
     dec <- getOption("decorate.contr.Sum")
     dec <- if (!contrasts) ""
            else if (is.null(dec)) "S."
            else dec
     show.lev <- getOption("contr.Sum.show.levels")
     contr.names <- if ((is.null(show.lev)) || show.lev) paste(pre, dec, 
levels, suf, sep="")
     if (contrasts) {
         cont <- array(0, c(lenglev, lenglev - 1), list(levels,
             contr.names[-lenglev]))
         cont[col(cont) == row(cont)] <- 1
         cont[lenglev, ] <- -1
     }
     else {
         cont <- array(0, c(lenglev, lenglev), list(levels,
             contr.names))
         cont[col(cont) == row(cont)] <- 1
     }
     cont
}
contr.Helmert <- function (n, contrasts = TRUE)
{
     if (length(n) <= 1) {
         if (is.numeric(n) && length(n) == 1 && n > 1)
             levels <- 1:n
         else stop("contrasts are not defined for 0 degrees of
freedom")
     }
     else levels <- n
     lenglev <- length(levels)
     lev.opt <- getOption("decorate.contrasts")
     pre <- if (is.null(lev.opt)) "[" else lev.opt[1]
     suf <- if (is.null(lev.opt)) "]" else lev.opt[2]
     dec <- getOption("decorate.contr.Helmert")
     dec <- if (!contrasts) ""
            else if (is.null(dec)) "H."
            else dec
     nms <- if (contrasts) 1:lenglev else levels
     contr.names <- paste(pre, dec, nms, suf, sep="")
     if (contrasts) {
         cont <- array(-1, c(lenglev, lenglev - 1), list(levels,
             contr.names[-lenglev]))
         cont[col(cont) <= row(cont) - 2] <- 0
         cont[col(cont) == row(cont) - 1] <- 1:(lenglev - 1)
     }
     else {
         cont <- array(0, c(lenglev, lenglev), list(levels, contr.names))
         cont[col(cont) == row(cont)] <- 1
     }
     cont
}
------------------------------- Contrasts.Rd 
------------------------------------------
\name{Contrasts}
\alias{Contrasts}
\alias{contr.Treatment}
\alias{contr.Sum}
\alias{contr.Helmert}
\title{Functions to Construct Contrasts}
\description{
     These are substitutes for similarly named functions in the base package
     (note the uppercase letter starting the second word in each function 
name).
     The only difference is that the contrast functions from the car package
     produce easier-to-read names for the contrasts when they are used in 
statistical models.
     The functions and this documentation are adapted from the base package.
     }
\usage{
contr.Treatment(n, base = 1, contrasts = TRUE)
contr.Sum(n, contrasts = TRUE)
contr.Helmert(n, contrasts = TRUE)
}
\arguments{
   \item{n}{a vector of levels for a factor, or the number of levels.}
   \item{base}{an integer specifying which level is considered the baseline 
level.
     Ignored if \code{contrasts} is \code{FALSE}.}
   \item{contrasts}{a logical indicating whether contrasts should be computed.}
}
\details{
     These functions are used for creating contrast matrices for use in 
fitting analysis of variance and regression models.
     The columns of the resulting matrices contain contrasts which can be 
used for coding a factor with \code{n} levels.
     The returned value contains the computed contrasts. If the argument 
\code{contrasts} is \code{FALSE} then a square matrix is returned.
     Several aspects of these contrast functions are controlled by options 
set via the \code{options} command:
     \describe{
         \item{\code{decorate.contrasts}}{This option should be set to a 
2-element character vector containing the prefix and suffix
             characters to surround contrast names. If the option is not 
set, then \code{c("[", "]")} is used. For example, setting
             \code{options(decorate.contrasts=c(".", ""))}
produces
contrast names that are separated from factor names by a period.
             Setting \code{options(decorate.contrasts=c("",
""))}
reproduces the behaviour of the R base contrast functions.}
         \item{\code{decorate.contr.Treatment}}{A character string to be 
appended to contrast names to signify treatment contrasts;
             if the option is unset, then \code{"T."} is used.}
         \item{\code{decorate.contr.Sum}}{Similar to the above, with 
default \code{"S."}.}
         \item{\code{decorate.contr.Helmert}}{Similar to the above, with 
default \code{"H."}.}
         \item{\code{contr.Sum.show.levels}}{Logical value: if \code{TRUE} 
(the default if unset),
             then level names are used for contrasts; if \code{FALSE}, then 
numbers are used, as in \code{contr.sum}
             in the \code{base} package.}
         }
     Note that there is no replacement for \code{contr.poly} in the 
\code{base} package (which produces
     orthogonal-polynomial contrasts) since this function already 
constructs easy-to-read contrast names.
}
\value{
     A matrix with \code{n} rows and \code{k} columns, with \code{k = n - 
1} if \code{contrasts} is \code{TRUE}
     and \code{k = n} if \code{contrasts} is \code{FALSE}.
}
\author{John Fox \email{jfox at mcmaster.ca}}
\seealso{\code{\link[base]{contr.treatment}}, \code{\link[base]{contr.sum}},
   \code{\link[base]{contr.helmert}}, \code{\link[base]{contr.poly}} }
\examples{
# contr.Treatment vs. contr.treatment in the base package:
data(Prestige)
lm(prestige ~ (income + education)*type, data=Prestige,
     contrasts=list(type="contr.Treatment"))
##  Call:
##  lm(formula = prestige ~ (income + education) * type, data = Prestige,
##      contrasts = list(type = "contr.Treatment"))
##
##  Coefficients:
##          (Intercept)                  income               education
##              2.275753                0.003522                1.713275
##          type[T.prof]              type[T.wc]     income:type[T.prof]
##              15.351896              -33.536652               -0.002903
##      income:type[T.wc]  education:type[T.prof]    education:type[T.wc]
##              -0.002072                1.387809                4.290875
lm(prestige ~ (income + education)*type, data=Prestige,
     contrasts=list(type="contr.treatment"))
##  Call:
##  lm(formula = prestige ~ (income + education) * type, data = Prestige,
##      contrasts = list(type = "contr.treatment"))
##
##  Coefficients:
##      (Intercept)              income           education
##          2.275753            0.003522            1.713275
##          typeprof              typewc     income:typeprof
##          15.351896          -33.536652           -0.002903
##      income:typewc  education:typeprof    education:typewc
##          -0.002072            1.387809            4.290875
}
\keyword{models}
\keyword{regression}
-------------------------------------------------------------------------------------------------------------------------------------------------
At 04:37 PM 2/14/2003 +1100, j+rhelp at howard.fm wrote:>This is my first post to this list so I suppose a quick intro is in
>order. I've been using SPLUS 2000 and R1.6.2 for just a couple of days,
>and love S already. I'm reading MASS and also John Fox's book - both
have
>been very useful. My background in stat software was mainly SPSS (which
>I've never much liked - thanks heavens I've found S!), and Perl is
my
>tool of choice for general-purpose programming (I chaired the
>perl6-language-data working group, responsible for improving the data
>analysis capabilities in Perl).
>
>I have just completed my first S project, and I now have 8 lm.objects.
>The models are all reasonably complex with multiple numeric and factor
>variables and some 2-way and 3-way interactions. I now need to use these
>models in other environments, such as C code, SQL functions (using CASE)
>and in Perl - I can not work out how to do this.
>
>The difficulty I am having is that the output of coef() is not really
>parsable, since there is no marker in the name of an coefficient of
>separate out the components. For instance, in SPSS the name of a
>coefficient might be:
>
>   var1=[a]*var2=[b]*var3
>
>...which is easy to write a little script to pull that apart and turn it
>into a line of SQL, C, or whatever. In S however the name looks like:
>
>   var1avar2bvar3
>
>...which provides no way to pull the bits apart.
>
>So my question is, how do I export an lm.object in some form that I can
>then apply to prediction in C, SQL, or some other language? All I'm
>looking for is some well-structured textual or data frame output that I
>can then manipulate with appropriate tools, whether it be S itself, or
>something like Perl.
>
>Thanks in advance for any suggestions (and apologies in advance if this
>is well documented somewhere!),
On Fri, 14 Feb 2003 08:25:05 +0000 (GMT), ripley at stats.ox.ac.uk said:> On Fri, 14 Feb 2003 j+rhelp at howard.fm wrote: > > Thanks for the suggestion. After my last post I tried switching from > > SPLUS to R and discovered the useful xlevels attribute, which when output > > Eh? S-PLUS has an "xlevels" attribute, but R has an "xlevels" component > (speaks the author of both). >Thanks for the clarification.> You add methods to functions, not classes, in R. You could indeed add > generic accessor functions with lm methods, but their absence (and the > lack of documentation of the internal structure) should alert you to the > idea that this is internal structure and not part of a public API. >OK, I'll stay away from that then...> > In SPLUS I came across a useful attribute 'assign', which has a mapping > > of term names to variables - the same attribute in R doesn't appear to > > provide this information. Is this available somewhere? > > Both have an assign *component*, not attribute. R has a mapping from > model.matrix columns to terms in its assign component: that's not an > accurate description of S-PLUS's component .... >Now that I look again more closely I see what you mean. Sorry for the missing that. -- Jeremy Howard jhoward at fastmail.fm