David James
2016-Sep-30 14:26 UTC
[R] isssues with predict.coxph, offset, type = "expected", and newdata?
Hi, It seems there might be two issues with predict.coxph(), and I'd appreciate confirmation before submitting a bug report to the package author. (1) predict() seems to produce incorrect predictions when using type = "expected" from a Cox model with an offset and specifying a new data for prediction (see Example 1 below). (2) predict() produces an error for a call with a null model and a newdata= argument (see Example 2 below). Please note that I didn't check these cases with additional complications, like strata(), cluster(), etc. SessionInfo() at the bottom of this email (notice I'm using survival 2.39-5, the current version in CRAN). Thanks, David -------------------------------------------------------------------------------------------------------------- Example 1> library("survival") > data(lung) > > # simulate a three-fold genetic effect (worsening) > set.seed(123) > lung$gene.eff <- sample(log(c(1, 3)), size=nrow(lung), replace=TRUE, prob=c(0.5, 0.5)) > > # fix the gene effect, do not estimate it > m1 <- coxph(Surv(time, status) ~ age + sex + offset(gene.eff), data = lung) > > # two hypothetical individuals differing only on gene.eff > nd <- expand.grid(time=180, status=1, age=65, sex=1, gene.eff=log(c(1,3))) > > p1.lp <- predict(m1, newdata = nd, type = "lp") > p1.rsk <- predict(m1, newdata = nd, type = "risk") > p1.exp <- predict(m1, newdata = nd, type = "expected") > > # output from type "lp" (linear predictor) and "risk" are ok, > # but not from type "expected" > all.equal(3, exp(p1.lp[2]-p1.lp[1]), check.names = FALSE)[1] TRUE> all.equal(3, p1.rsk[2]/p1.rsk[1], check.names = FALSE)[1] TRUE> all.equal(3, p1.exp[2]/p1.exp[1], check.names = FALSE)[1] "Mean relative difference: 0.6666667"> > # notice that predict() produces the same expected number of > # events for both hypothetical individuals > print(p1.exp)[1] 0.1882663 0.1882663 Example 2> # null model > m0 <- coxph(Surv(time, status) ~ 1, data = lung) > > # time points at which we want predictions > nd <- data.frame(time=seq(from=0, to=360, by=30), status=rep(1, 13)) > > # predictions are produced for types "lp" and "risk", but not "expected" > p0.lp <- predict(m0, type="lp", newdata = nd) > p0.rsk <- predict(m0, type="risk", newdata = nd) > p0.exp <- predict(m0, type="expected", newdata = nd)Error in newx %*% object$coef : requires numeric/complex matrix/vector arguments>Session Info:> sessionInfo()R version 3.3.1 (2016-06-21) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] survival_2.39-5 loaded via a namespace (and not attached): [1] Matrix_1.2-6 tools_3.3.1 splines_3.3.1 grid_3.3.1 lattice_0.20-33