array chip
2014-Jun-16 03:22 UTC
[R] prediction based on conditional logistic regression clogit
Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example:> dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) > dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) > fit<-clogit(status~x1+x2+strata(set),dat) > predict(fit,newdata=dat.test,type='expected')Error in Surv(rep(1, 150L), status) : Time and status are different lengths Can anyone suggest what's wrong here? Thanks! John [[alternative HTML version deleted]]
peter dalgaard
2014-Jun-16 09:22 UTC
[R] prediction based on conditional logistic regression clogit
On 16 Jun 2014, at 05:22 , array chip <arrayprofile at yahoo.com> wrote:> Hi, I am using clogit() from survival package to do conditional logistic regression. I also need to make prediction on an independent dataset to calculate predicted probability. Here is an example: > > >> dat <- data.frame(set=rep(1:50,each=3), status=rep(c(1,0,0),50), x1=rnorm(150,5,1), x2=rnorm(150,7,1.5)) >> dat.test <- data.frame(set=rep(1:30,each=3), status=rep(c(1,0,0),30), x1=rnorm(90,5,1), x2=rnorm(90,7,1.5)) >> fit<-clogit(status~x1+x2+strata(set),dat) >> predict(fit,newdata=dat.test,type='expected') > Error in Surv(rep(1, 150L), status) : > Time and status are different lengths > > Can anyone suggest what's wrong here? >The direct cause is that clogit() works by using the fact that the likelihood is equivalent to a coxph() likelihood with stratification and all observation lengths set to 1. Therefore the analysis is formally on Surv(rep(1, 150L), status) and that goes belly-up if you apply the same formula to a data set of different length. However, notice that there is no such thing as predict.clogit(), so you are attempting predict.coxph() on a purely formal Cox model. It is unclear to what extent predicted values, in the sense of coxph() are compatible with predictions in conditional logit models. I'm rusty on this, but I think what you want is something like m <- model.matrix(~ x1 + x2 - 1, data=dat.test) pp <- exp(m %*% coef(fit)) pps <- ave(pp, dat.test$set, FUN=sum) pp/pps i.e. the conditional probability that an observation is a case given covariates and that there is on case in each set (in the data given, you have sets of three with one being a case, so all predicted probabilities are close to 0.33). For more general matched sets, I'm not really sure what one wants. Real experts are welcome to chime in. -pd> Thanks! > > John > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com