Dear Jan, Thank you very much for your excellent description of the problem and the self-contained test code. This is a problem that I've been meaning to either document better or solve for some time. The root of the problem is with the builtin S-Plus terms.inner function:> attr(terms.inner(asthma ~ pol(age,kx) + smok),'variables')expression(age, kx, smok) You can see that terms.inner inappropriately includes kx as an independent variable as it does not know that the first argument to pol is the special variable. When a constant replaces kx, all is well. As I am relying on the C code called by terms.inner to do the job, I don't have a ready solution. I would be happy if someone comes up with a solution. The all.vars function in the R language has the same limitation: all.vars(asthma ~ pol(age,kx) + smok) [1] "asthma" "age" "kx" "smok" Frank Harrell Jan Brogger wrote:> > I get funny behaviour from lrm when using transformations that require a > parameter, and the parameter is a variable. It only happens when there are > other variables in the model. The transformation must be the last term in > the formula, or some labels will be wrong. For example, if the correct > labels are "age" "age^2" and "smok", the labels will be "age", "age^2" and > "kx" where "kx" is actually the parameter for the transformation, and it > should be "smok". > > I have tried including the variable that contains the parameter for the > transformation with datadist, but it doesn't help. Also tried reinstalling > newest Hmisc, Design and starting new project directory. Platform: S+2000 > Win98. It happens to both rcs and pol. With glm, everything is OK. The > value of coefficients are the same. > > It isn't a major issue, but it might puzzle some and I thought I'd let you > know. Solutions are: specify the parameters transformations directly, not > in variables, or keep your transformations last in the formula. Any better > solutions ? It'd be nice to be able to keep knot locations in a variable, > so you can change it only one place in your code. > > This code reproduces the problem: > library(Hmisc,T) > library(Design,T) > age <- rnorm(500,41.5,12) #age > smok <- sample(0:2,500,rep=T) #smoking habit > asthma <- sample(0:1,500,rep=T) #asthma > mydat <- data.frame(age,smok,asthma) > dd <- datadist(mydat) > options(datadist="dd") > kx <- 2 > > #this model gives the wrong labels > lrm(asthma~pol(age,kx)+smok,data=mydat) > #this model gives the right label > lrm(asthma~smok+pol(age,kx),data=mydat) > > #If you don't use a variable, but a value, then everything is ok > lrm(asthma~smok+pol(age,2),data=mydat) > lrm(asthma~pol(age,2)+smok,data=mydat) > #If you only have one term, then everything is OK > lrm(asthma~pol(age,2),data=mydat) > #It doesn't happen with GLM: > glm(asthma~pol(age,2)+smok,data=mydat,family=binomial) > #but it does with rcs: > kx <- c(15,40,70) > lrm(asthma~rcs(age,kx)+smok,data=mydat) > lrm(asthma~smok+rcs(age,kx),data=mydat) > > Jan Brogger, > University of Bergen, Norway > --------------------------------------------------------------------- > This message was distributed by s-news at lists.biostat.wustl.edu. To > unsubscribe send e-mail to s-news-request at lists.biostat.wustl.edu with > the BODY of the message: unsubscribe s-news-- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Sat, 28 Jul 2001 fharrell at virginia.edu wrote:> Dear Jan, > > Thank you very much for your excellent description of the > problem and the self-contained test code. This is a > problem that I've been meaning to either document better > or solve for some time. The root of the problem is with > the builtin S-Plus terms.inner function: > > > attr(terms.inner(asthma ~ pol(age,kx) + smok),'variables') > expression(age, kx, smok) > > You can see that terms.inner inappropriately includes kx > as an independent variable as it does not know that > the first argument to pol is the special variable. > When a constant replaces kx, all is well. > > As I am relying on the C code called by terms.inner to > do the job, I don't have a ready solution. I would > be happy if someone comes up with a solution. The > all.vars function in the R language has the same > limitation: > > all.vars(asthma ~ pol(age,kx) + smok) > [1] "asthma" "age" "kx" "smok"Well, 1) all.vars comes from S, and works the same under R. 2) I don't think it is a limitation. all.vars is described as Description: Return a character vector containing all the names which occur in an expression or call. and kx is such a name. Indeed, there is no way to know from just the formula if it is length n-vector or a scalar. I don't know what pol() is, but guess it is from your library, in which case the interpretation may well depend on where (if anywhere) that library is in the search path. If you want special-purpose names in formulae (like strata) you need to extend the formula-handling code. Neither S4 nor R make that easy. Brian -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Sat, 28 Jul 2001 fharrell at virginia.edu wrote:> Dear Jan, > > Thank you very much for your excellent description of the > problem and the self-contained test code. This is a > problem that I've been meaning to either document better > or solve for some time. The root of the problem is with > the builtin S-Plus terms.inner function: > > > attr(terms.inner(asthma ~ pol(age,kx) + smok),'variables') > expression(age, kx, smok) > > You can see that terms.inner inappropriately includes kx > as an independent variable as it does not know that > the first argument to pol is the special variable. > When a constant replaces kx, all is well. >I think this can't be fixed in general. There is simply no way to know whether pol(age,kx) contains two variables (like interaction(age,kx)) or one variable, and if so, which is the variable and which the parameter. The termplot() function has a function carrier.names() that guesses that the first argument is the only variable, which is a useful heuristic until you have log(0.5+x) as a term. You can get the results of all.vars broken down by term like> fI(log(0.5 + x)) ~ pol(age, kx) + ns(sbp, df) + factor(race, labels races) + sex> av<-all.vars(f) > sapply(attr(terms(f),"term.labels"),function(term)av[match(all.vars(parse(text=term)),av)]) $"pol(age, kx)" [1] "age" "kx" $"ns(sbp, df)" [1] "sbp" "df" $"factor(race, labels = races)" [1] "race" "races" $sex [1] "sex" (actually you only get the RHS of the formula, but that shouldn't be hard to fix) You might also look at how nlme and Jim Lindsey's nonlinear models functions solve the question of whether a name refers to a a variable or a parameter. -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Seemingly Similar Threads
- "Unable to fit" error message from the lrm function in the rms library
- when vectorising does not work: silent function fail?
- Extracting P-values from the lrm function in the rms library
- Rd syntax error detected in CRAN daily checks
- Newbie clustering/classification question