Ferenci Tamas
2019-Aug-18 17:10 UTC
[R] results of a survival analysis change when converting the data to counting process format
Dear All,
Consider the following simple example:
library( survival )
data( veteran )
coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) )
trt prior karno
0.180197194 -0.005550919 -0.033771018
Note that we have neither time-dependent covariates, nor time-varying
coefficients, so the results should be the same if we change to
counting process format, no matter where we cut the times.
That's true if we cut at event times:
veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno,
data = veteran, cut = unique( veteran$time ) )
coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) )
trt prior karno
0.180197194 -0.005550919 -0.033771018
But quite interestingly not true, if we cut at every day:
veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno,
data = veteran, cut = 1:max(veteran$time) )
coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) )
trt prior karno
0.180197215 -0.005550913 -0.033771016
The difference is not large, but definitely more than just a rounding
error, or something like that.
What's going on? How can the results get wrong, especially by
including more cutpoints?
Thank you in advance,
Tamas
Göran Broström
2019-Aug-22 19:48 UTC
[R] results of a survival analysis change when converting the data to counting process format
On 2019-08-18 19:10, Ferenci Tamas wrote:> Dear All, > > Consider the following simple example: > > library( survival ) > data( veteran ) > > coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) ) > trt prior karno > 0.180197194 -0.005550919 -0.033771018 > > Note that we have neither time-dependent covariates, nor time-varying > coefficients, so the results should be the same if we change to > counting process format, no matter where we cut the times. > > That's true if we cut at event times: > > veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, > data = veteran, cut = unique( veteran$time ) ) > > coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) ) > trt prior karno > 0.180197194 -0.005550919 -0.033771018 > > But quite interestingly not true, if we cut at every day: > > veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, > data = veteran, cut = 1:max(veteran$time) ) > > coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) ) > trt prior karno > 0.180197215 -0.005550913 -0.033771016 > > The difference is not large, but definitely more than just a rounding > error, or something like that. > > What's going on? How can the results get wrong, especially by > including more cutpoints?All results are wrong, but they are useful (paraphrasing George EP Box). G?ran> > Thank you in advance, > Tamas > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Göran Broström
2019-Aug-23 09:12 UTC
[R] results of a survival analysis change when converting the data to counting process format
Den 2019-08-22 kl. 21:48, skrev G?ran Brostr?m:> > > On 2019-08-18 19:10, Ferenci Tamas wrote: >> Dear All, >> >> Consider the following simple example: >> >> library( survival ) >> data( veteran ) >> >> coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) ) >> ????????? trt??????? prior??????? karno >> ? 0.180197194 -0.005550919 -0.033771018 >> >> Note that we have neither time-dependent covariates, nor time-varying >> coefficients, so the results should be the same if we change to >> counting process format, no matter where we cut the times. >> >> That's true if we cut at event times: >> >> veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, >> ??????????????????????? data = veteran, cut = unique( veteran$time ) ) >> >> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = >> veteran2 ) ) >> ????????? trt??????? prior??????? karno >> ? 0.180197194 -0.005550919 -0.033771018 >> >> But quite interestingly not true, if we cut at every day: >> >> veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, >> ??????????????????????? data = veteran, cut = 1:max(veteran$time) ) >> >> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = >> veteran3 ) ) >> ????????? trt??????? prior??????? karno >> ? 0.180197215 -0.005550913 -0.033771016 >> >> The difference is not large, but definitely more than just a rounding >> error, or something like that. >> >> What's going on? How can the results get wrong, especially by >> including more cutpoints? > > All results are wrong, but they are useful (paraphrasing George EP Box).That said, it is a little surprising: The generated risk sets are (should be) identical in all cases, and one would expect rounding errors to be the same. But data get stored differently, and ... who knows? I tried your examples on my computer and got exactly the same results as you. Which surprised me. G,> > G?ran > >> >> Thank you in advance, >> Tamas >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.