Terry Therneau
2012-Jul-18 13:06 UTC
[R] Building a web risk calculator based on Cox, PH--definitive method for calculating probability?
Here is an example of how to do it. > library(survival) > vfit <- coxph(Surv(time, status) ~ celltype + trt, data=veteran) > userinput <- data.frame(celltype="smallcell", trt = 1) > usercurve <- survfit(vfit, newdata=userinput) #the entire predicted survival curve > user2 <- summary(usercurve, time= 2*365.25) # 2 year time point > user2$surv [1] 0.0007438084 Some comments: 1. The summary function for survival curves was written so that people could print out shorter summaries, but it works nicely for picking off a particular time point. Since the curve is a step function and there isn't likely to be a step at exactly "x" years, this is a bit more work to do yourself. You might want to include the confidence limits in your web report as well. 2. Put the whole formula into your coxph call. I have never ever understood why people use the construct tempvar <- with(data, Surv(time, status)) coxph(tempvar ~ age + sex + .... It leaves you with harder to read code, poorer documentation (the printout from coxph no longer shows the actual response variable), leads to hard-to-diagnose failures for certain uses of predict, ... the list goes on. I have not yet thought of a single good reason for doing it other than "because you can". 3. Make the user data the same as the original. In the veteran cancer data set "trt" is a numeric 0/1 variable, you had it as a factor in the new data set. 4. Your should get your keyboard fixed -- it appears that the spacebar is disabled when writing code :-) 5. If you plot the survival curve for the veterans cancer data set it only reaches to about 2 1/2 years, so the summary for 5 years will return NULL. Terry Therneau On 07/18/2012 05:00 AM, r-help-request at r-project.org wrote:> I am a medical student and as a capstone for my summer research project I am > going to create a simple online web "calculator" for users to input their > relevant data, and a probability of relapse within 5 years will be computed > and returned based on the Cox PH model I have developed. > > The issue I'm having is finding a definitive method/function to feed the > user's "newdata" and return the probability of relapse within 5 years. I > have googled this and the answers seems to be inconsistent; I have variously > seen people recommend survest(), survfit(), and predict.coxph(). Terry had > a useful answer > http://r.789695.n4.nabble.com/how-to-calculate-predicted-probability-of-Cox-model-td4515265.html > here but I didn't quite understand what he meant in his last sentence. > > Here is some code for you to quickly illustrate what you suggest. > > library(rms) > library(survival) > library(Hmisc) > data(veteran) > dd=datadist(veteran) > options(datadist='dd') > options(digits=4) > obj=with(veteran,Surv(time,status)) > vetcoxph=coxph(obj~celltype+trt,data=veteran) #I will fit models from > both the survival and rms packages so you can > #use what you like > vetcph=cph(obj~celltype+trt,data=veteran,surv=TRUE,time.inc=5*365,x=T,y=T) > #let's say the user inputted that their cell type was smallcell and their > treatment was "1". > userinput=data.frame(celltype='smallcell',trt=factor(1)) > > I really appreciate your recommendations > > Best, > Jahan