thr3ads.net - R help - [R] results of a survival analysis change when converting the data to counting process format [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Ferenci Tamas

2019-Aug-23 14:39 UTC

[R] results of a survival analysis change when converting the data to counting process format

Thanks for all those comments, and thanks Terry for the repair!

Tamas


2019. augusztus 23., 14:40:05, ?rtad:


I've spent some time chasing this down, and another error of similar type.  
It's repaired in version 3.0-9  (soon to go to CRAN, or at least that is the
plan  --- I'm working through one last fail in the checks.)

For those who want to know more---

When there is start, stop data such as this:  (0, 101]  (101, 123]   (123, 400],
....  the algorithm used by coxph is to work from largest to smallest time. 
Every time it hits the start of an interval (400, 123, 101) that observation is
added into the risk set, and whenever it hits the start of an interval the
observation is removed (123, 101, 0).    "Add" and "remove"
also means "add to the current estimate of the running mean and
variance" of the covariates.   At each event time that current mean and
variance are are used to update the loglik and its derivatives.

When a subject is diced into little peices of (1,2] (2,3] (3,4], ....  like the
last fit below, the sheer number of updates to the running mean/variance can
lead to accumulated round off error.   The code works hard to avoid this, and
some of that is subtle --- round off error is a tricky business --- and has gone
through several refinements.    Nevertheless, the result for this data set
should not have been that far off.  Another user example from about the same
time was worse so I've been focusing on that one: one of the two splits led
to an exp() overflow and one didn't, giving results that were completely
different.  This led to a more careful review and some changes that addressed
the example below as well.

Terry T.

On 8/23/19 5:00 AM, r-help-request at r-project.org wrote:

On 2019-08-18 19:10, Ferenci Tamas wrote:

Dear All,



Consider the following simple example:



library( survival )

data( veteran )



coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )

          trt        prior        karno

  0.180197194 -0.005550919 -0.033771018



Note that we have neither time-dependent covariates, nor time-varying

coefficients, so the results should be the same if we change to

counting process format, no matter where we cut the times.



That's true if we cut at event times:



veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno,

                        data = veteran, cut = unique( veteran$time ) )



coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) )

          trt        prior        karno

  0.180197194 -0.005550919 -0.033771018



But quite interestingly not true, if we cut at every day:



veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,

                        data = veteran, cut = 1:max(veteran$time) )



coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) )

          trt        prior        karno

  0.180197215 -0.005550913 -0.033771016



The difference is not large, but definitely more than just a rounding

error, or something like that.



What's going on? How can the results get wrong, especially by

including more cutpoints?

R help - Aug 2019 - results of a survival analysis change when converting the data to counting process format

[R] results of a survival analysis change when converting the data to counting process format