Therneau, Terry M., Ph.D.
2014-Jun-17 14:38 UTC
[R] prediction based on conditional logistic regression, clogit
I ususally scan the digest for "surv", so missed your question on the first round. You caught predict with a case that I never thought of; I'll look into making it smarter. As Peter D said, the clogit function simply sets up a special data set and then calls coxph, and is based on an identity that the likelihood for the conditional logistic is identical to the likelihood of a Cox model for a specially structured data set. I vacillate on whether this identity is just a mathematical accident or the mark of some deep truth. However it did allow us to use existing code rather than write new routine from scratch. In this special data set all of the follow-up times are set to 1 (any constant value would do). For normal Cox regression a predicted "expected number of events" depends on the length of follow-up for the subject, so the predict routine expects to find a follow-up time variable in your newdata. In my code, "rep(1, 150L)" is being mistaken for the name of the time variable, and failure of the confused routine ensues in due course. For the interim: in a conditional logistic the "expected number of events" is just exp(eta)/(1 + exp(eta)) where eta is the linear predictor. There is no follow-up time to worry about and predict.coxph would have done way too much work, even if it hadn't gotton confused. Use risk <- predict(fit, newdata=dat.test, type='risk') risk / (1+risk) Terry Therneau> >> >Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: >> > >> > >>> >>dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) >>> >>dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) >>> >>fit<-clogit(status~x1+x2+strata(set),dat) >>> >>predict(fit,newdata=dat.test,type='expected') >> >Error in Surv(rep(1, 150L), status) : >> >? Time and status are different lengths >> > >> >Can anyone suggest what's wrong here? >> >
peter dalgaard
2014-Jun-18 14:51 UTC
[R] prediction based on conditional logistic regression, clogit
On 17 Jun 2014, at 16:38 , Therneau, Terry M., Ph.D. <therneau at mayo.edu> wrote:> As Peter D said, the clogit function simply sets up a special data set and then calls coxph, and is based on an identity that the likelihood for the conditional logistic is identical to the likelihood of a Cox model for a specially structured data set. I vacillate on whether this identity is just a mathematical accident or the mark of some deep truth.My take on this is that it is the latter, but in reverse: The Cox likelihood works by splitting the event history into a series of conditional experiments: Given a risk set and that one member must die, work out the likelihood that it is the member that is observed to die. These experiments are clearly not independent since the risk set of one experiment depends on the outcome of the previous ones. However, arguably, it is still sensible to cumulate the information represented by the log-likelihood contributions. I.e. you can do condtional logistic regression with a Cox likelihood because the Cox likelihood _is_ a conditional logistic regression likelihood. -- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com