thr3ads.net - R help - [R] changes in coxph in "survival" from older version? [May 2011]

If this information is useful, please help other people find it:
Share via:

Shi, Tao

2011-May-11 23:11 UTC

[R] changes in coxph in "survival" from older version?

Hi all,

I found that the two different versions of "survival" packages, namely
2.36-5
vs. 2.36-8 or later, give different results for coxph function.  Please see 
below and the data is attached.  The second one was done on Linux, but Windows 
gave the same results.  Could you please let me know which one I should trust?

Thanks,

...Tao





#####============================ R2.13.0, survival 2.36-9 
================================> dat=read.csv("tmp1.csv",
header=T)> fit2 <- coxph(Surv(tt, cens) ~., data=dat)Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
  Ran out of iterations and did not converge> summary(fit2)## the estimates are different
.....> sessionInfo()R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United 
States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252


attached base packages:
[1] grDevices datasets  splines   graphics  stats     tcltk     utils     
methods   base     


other attached packages:
[1] svSocket_0.9-51 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3     
survival_2.36-9

loaded via a namespace (and not attached):
[1] cluster_1.13.3  grid_2.13.0     lattice_0.19-23 svMisc_0.9-61   
tools_2.13.0  

#####============================================ 
================================

#####============================ R2.12.2, survival 2.36-5 
================================> dat=read.csv("tmp1.csv",
header=T)> fit2 <- coxph(Surv(tt, cens) ~., data=dat)
> summary(fit2)## the estimates are different
.....> 
> sessionInfo()R version 2.12.2 (2011-02-25)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] survival_2.36-5
#####============================================ 
=================================

Terry Therneau

2011-May-12 13:42 UTC

head link

[R] changes in coxph in "survival" from older version?

On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote:> Hi all,
> 
> I found that the two different versions of "survival" packages,
namely 2.36-5
> vs. 2.36-8 or later, give different results for coxph function.  Please see
> below and the data is attached.  The second one was done on Linux, but
Windows
> gave the same results.  Could you please let me know which one I should
trust?
> 
> Thanks,
 In your case, neither.  Your data set has 22 events and 17 predictors;
the rule of thumb for a reliable Cox model is 10-20 events per predictor
which implies no more than 2 for your data set.  As a result, the
coefficients of your model have very wide confidence intervals, the coef
for Male for instance has se of 3.26, meaning the CI goes from 1/26 to
26 times the estimate; i.e., there is no biological meaning to the
estimate.

  Nevertheless, why did coxph give a different answer?  The later
version 2.36-9 failed to converge (20 iterations) with a final
log-likelihood of -19.94, the earlier code converges in 10 iterations to
-19.91.  In version 2.36-6 an extra check was put into the maximizer for
coxph in response to an exceptional data set which caused the routine to
fail due to overflow of the exp function; the Newton-Raphson iteration
algorithm had made a terrible guess in it's iteration path, which can
happen with all NR based search methods.  
   I put a limit on the size the linear predictor in the Cox model of
21.  The basic argument is that exp(linear-predictor) = relative risk
for a subject, and that there is not much biological meaning for risks
to be less than exp(-21) ~ 1/(population of the earh).  There is more to
the reasoning, interested parties should look at the comments in
src/coxsafe.c, a 5 line routine with 25 lines of discussion.  I will
happily accept input the "best" value for the constant.

   I never expected to see a data set with both convergence of the LL
and linear predictors larger than +-15.  Looking at the fit (older
code)> round(fit2$linear.predictor, 2) [1]   2.26   0.89   4.96 -19.09 -12.10   1.39   2.82   3.10
 [9]  18.57 -25.25  22.94   8.75   5.52 -27.64  14.88 -23.41
[17]  13.70 -28.45  -1.84  10.04  12.62   2.54   6.33  -8.76
[25]   9.68   4.39   2.92   3.51   6.02 -17.24   5.97

This says that, if the model is to be believed, you have several near
immortals in the data set. (Everyone else on earth will perish first).

Terry Therneau

Terry Therneau

2011-May-17 13:23 UTC

head link

[R] changes in coxph in "survival" from older version?

-- begin included message ---
I did realize that there are way more predictors in the model.  My
initial thinking was use that as an initial model for stepwise model
selection.  Now I  wonder if the model selection result is still valid
if the initial model didn't even converge?
--- end inclusion ---

You have 17 predictors with only 22 events.  All methods of "variable
selection" in such a scenario will give essentially random results.
There is simply not enough information present to determine a best
predictor or best subset of predictors.  

Terry Therneau

Possibly Parallel Threads

Search for more seemingly similar threads

R help - May 2011 - changes in coxph in "survival" from older version?

[R] changes in coxph in "survival" from older version?

[R] changes in coxph in "survival" from older version?

[R] changes in coxph in "survival" from older version?

Possibly Parallel Threads