Hi, I am experimenting with the function predict() in two versions of R and the R extension package "survival". library(survival) set.seed(123) testdat=data.frame(otime=rexp(10),event=rep(0:1,each=5),x=rnorm(10)) testfm=as.formula('Surv(otime,event)~x') testfun=function(dat,fm) { predict(coxph(fm,data=dat),type='lp',newdata=dat) } # Under R 2.11.1 and survival_2.35-8 testfun(testdat,testfm)> testfun(testdat,testfm)[,1] 1 -0.34786190 2 0.53622182 3 0.07861553 4 0.10030148 5 -0.05329258 6 -0.40619876 7 0.83422564 8 0.15170217 9 -1.15316613 10 0.25945273> sessionInfo()R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.35-8 # Under R 2.12.2 and survival_2.36-5> testfun(testdat,testfm)Error in model.frame(formula = fm, data = dat) : object 'fm' not found> sessionInfo()R version 2.12.2 (2011-02-25) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.36-5 Any enlightenment on what might be causing the different performance? Regards Ivo
--- begin included message --- I am experimenting with the function predict() in two versions of R and the R extension package "survival". library(survival) set.seed(123) testdat=data.frame(otime=rexp(10),event=rep(0:1,each=5),x=rnorm(10)) testfm=as.formula('Surv(otime,event)~x') testfun=function(dat,fm) { predict(coxph(fm,data=dat),type='lp',newdata=dat) } -- end inclusion ---- The question was: this works under survival 2.35-8, but not survival 2.36-5 Answer: The predict and underlying model.frame functions for coxph were brought into line with lm and other models. The major advantage is that I now deal with factors and data dependent predictors like ns() correctly. You've shown a disadvantage of which I was not aware. Using your example but replacing coxph() by lm() with otime ~x as the model I get a similar failure. I'd like to ask a wider audience of R-devel since it is bigger than coxph. Terry Therneau
On Fri, 2011-04-15 at 09:10 +0200, peter dalgaard wrote:> I couldn't reproduce it from Terry's description either, but there > _is_ an issue which parallels the ....I'll start with an apology: as a first guess to understand the problem with predict.coxph I tried something parallel to Ivo's example using lm, during a 5 minute slice of time between meetings. I got a similar error message, posted a query based on that, and ran off. The real root of my error message, however, was a simple typographical mistake. My apologies for the premature note, whose error was gently pointed out by all three of you. I should have waited till I had more time. Now for the real problem, which is an oversight in my design. When updating the predict.coxph, residuals.coxph, survfit.coxph and survexp.coxph functions to more modern processing of the newdata argument (proper handling of factors, etc), I made a decision to have all of them call model.frame.coxph and model.matrix.coxph. Model.matrix in particular has some special steps, and it would be better to have only one point of modification. However, this introduces one more level of indirection predict >- model.frame.coxph -> model.frame and the parent.frame() reference in model.frame.coxph now points to predict when we actually want it to point to the parent of predict. In predict.lm the parent.frame() argument lives in the predict.lm code. I see three paths to correction: 1. Make model.frame.coxph smarter. Peter's example seems to show that this is hard. 2. Change the line in predict.coxph (and the others) mf <- model.frame(object) to some form of eval() that causes the parent.frame() operation to reach back properly. I'm not yet sure what variant of eval or do.call or ... this is; the effect I'm looking for is similar to what NextMethod() does. 3. Change my call to mf <- model.frame(object$terms, ... mimic the call in lm This may be the simplest, since model.frame.coxph does not do anything special. Last, I need to check whether any of the above issues affect model.matrix calls. Any comments on which is the better path would be welcomed. Terry T.
An addition to my prior post: my option 3 is not as attractive as I thought. In several cases predict.coxph needs to reconstruct the original data frame, for things that were not saved in the coxph object. The half dozen lines to redo the orignal call with the same data, na.action, etc options is simple, but a nuisance to reproduce in multiple spots. Terry