Hi all, I found that the two different versions of "survival" packages, namely 2.36-5 vs. 2.36-8 or later, give different results for coxph function. Please see below and the data is attached. The second one was done on Linux, but Windows gave the same results. Could you please let me know which one I should trust? Thanks, ...Tao #####============================ R2.13.0, survival 2.36-9 ================================> dat=read.csv("tmp1.csv", header=T)> fit2 <- coxph(Surv(tt, cens) ~., data=dat)Warning message: In fitter(X, Y, strats, offset, init, control, weights = weights, : Ran out of iterations and did not converge> summary(fit2)## the estimates are different .....> sessionInfo()R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] grDevices datasets splines graphics stats tcltk utils methods base other attached packages: [1] svSocket_0.9-51 TinnR_1.0.3 R2HTML_2.2 Hmisc_3.8-3 survival_2.36-9 loaded via a namespace (and not attached): [1] cluster_1.13.3 grid_2.13.0 lattice_0.19-23 svMisc_0.9-61 tools_2.13.0 #####============================================ ================================ #####============================ R2.12.2, survival 2.36-5 ================================> dat=read.csv("tmp1.csv", header=T)> fit2 <- coxph(Surv(tt, cens) ~., data=dat) > summary(fit2)## the estimates are different .....> > sessionInfo()R version 2.12.2 (2011-02-25) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] survival_2.36-5 #####============================================ =================================
Terry Therneau
2011-May-12 13:42 UTC
[R] changes in coxph in "survival" from older version?
On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote:> Hi all, > > I found that the two different versions of "survival" packages, namely 2.36-5 > vs. 2.36-8 or later, give different results for coxph function. Please see > below and the data is attached. The second one was done on Linux, but Windows > gave the same results. Could you please let me know which one I should trust? > > Thanks,In your case, neither. Your data set has 22 events and 17 predictors; the rule of thumb for a reliable Cox model is 10-20 events per predictor which implies no more than 2 for your data set. As a result, the coefficients of your model have very wide confidence intervals, the coef for Male for instance has se of 3.26, meaning the CI goes from 1/26 to 26 times the estimate; i.e., there is no biological meaning to the estimate. Nevertheless, why did coxph give a different answer? The later version 2.36-9 failed to converge (20 iterations) with a final log-likelihood of -19.94, the earlier code converges in 10 iterations to -19.91. In version 2.36-6 an extra check was put into the maximizer for coxph in response to an exceptional data set which caused the routine to fail due to overflow of the exp function; the Newton-Raphson iteration algorithm had made a terrible guess in it's iteration path, which can happen with all NR based search methods. I put a limit on the size the linear predictor in the Cox model of 21. The basic argument is that exp(linear-predictor) = relative risk for a subject, and that there is not much biological meaning for risks to be less than exp(-21) ~ 1/(population of the earh). There is more to the reasoning, interested parties should look at the comments in src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will happily accept input the "best" value for the constant. I never expected to see a data set with both convergence of the LL and linear predictors larger than +-15. Looking at the fit (older code)> round(fit2$linear.predictor, 2)[1] 2.26 0.89 4.96 -19.09 -12.10 1.39 2.82 3.10 [9] 18.57 -25.25 22.94 8.75 5.52 -27.64 14.88 -23.41 [17] 13.70 -28.45 -1.84 10.04 12.62 2.54 6.33 -8.76 [25] 9.68 4.39 2.92 3.51 6.02 -17.24 5.97 This says that, if the model is to be believed, you have several near immortals in the data set. (Everyone else on earth will perish first). Terry Therneau
Terry Therneau
2011-May-17 13:23 UTC
[R] changes in coxph in "survival" from older version?
-- begin included message --- I did realize that there are way more predictors in the model. My initial thinking was use that as an initial model for stepwise model selection. Now I wonder if the model selection result is still valid if the initial model didn't even converge? --- end inclusion --- You have 17 predictors with only 22 events. All methods of "variable selection" in such a scenario will give essentially random results. There is simply not enough information present to determine a best predictor or best subset of predictors. Terry Therneau
Maybe Matching Threads
- coxph weirdness
- basehaz() in package survival and warnings with coxph
- basehaz() in package 'Survival' and warnings() with coxph
- Expected number of events, Andersen-Gill model fit via coxph in package survival
- predicted time length differs from survfit.coxph: