Ferenci Tamas
2019-Aug-18 17:10 UTC
[R] results of a survival analysis change when converting the data to counting process format
Dear All, Consider the following simple example: library( survival ) data( veteran ) coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) ) trt prior karno 0.180197194 -0.005550919 -0.033771018 Note that we have neither time-dependent covariates, nor time-varying coefficients, so the results should be the same if we change to counting process format, no matter where we cut the times. That's true if we cut at event times: veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, data = veteran, cut = unique( veteran$time ) ) coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) ) trt prior karno 0.180197194 -0.005550919 -0.033771018 But quite interestingly not true, if we cut at every day: veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, data = veteran, cut = 1:max(veteran$time) ) coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) ) trt prior karno 0.180197215 -0.005550913 -0.033771016 The difference is not large, but definitely more than just a rounding error, or something like that. What's going on? How can the results get wrong, especially by including more cutpoints? Thank you in advance, Tamas
Göran Broström
2019-Aug-22 19:48 UTC
[R] results of a survival analysis change when converting the data to counting process format
On 2019-08-18 19:10, Ferenci Tamas wrote:> Dear All, > > Consider the following simple example: > > library( survival ) > data( veteran ) > > coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) ) > trt prior karno > 0.180197194 -0.005550919 -0.033771018 > > Note that we have neither time-dependent covariates, nor time-varying > coefficients, so the results should be the same if we change to > counting process format, no matter where we cut the times. > > That's true if we cut at event times: > > veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, > data = veteran, cut = unique( veteran$time ) ) > > coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran2 ) ) > trt prior karno > 0.180197194 -0.005550919 -0.033771018 > > But quite interestingly not true, if we cut at every day: > > veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, > data = veteran, cut = 1:max(veteran$time) ) > > coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = veteran3 ) ) > trt prior karno > 0.180197215 -0.005550913 -0.033771016 > > The difference is not large, but definitely more than just a rounding > error, or something like that. > > What's going on? How can the results get wrong, especially by > including more cutpoints?All results are wrong, but they are useful (paraphrasing George EP Box). G?ran> > Thank you in advance, > Tamas > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Göran Broström
2019-Aug-23 09:12 UTC
[R] results of a survival analysis change when converting the data to counting process format
Den 2019-08-22 kl. 21:48, skrev G?ran Brostr?m:> > > On 2019-08-18 19:10, Ferenci Tamas wrote: >> Dear All, >> >> Consider the following simple example: >> >> library( survival ) >> data( veteran ) >> >> coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) ) >> ????????? trt??????? prior??????? karno >> ? 0.180197194 -0.005550919 -0.033771018 >> >> Note that we have neither time-dependent covariates, nor time-varying >> coefficients, so the results should be the same if we change to >> counting process format, no matter where we cut the times. >> >> That's true if we cut at event times: >> >> veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, >> ??????????????????????? data = veteran, cut = unique( veteran$time ) ) >> >> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = >> veteran2 ) ) >> ????????? trt??????? prior??????? karno >> ? 0.180197194 -0.005550919 -0.033771018 >> >> But quite interestingly not true, if we cut at every day: >> >> veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, >> ??????????????????????? data = veteran, cut = 1:max(veteran$time) ) >> >> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = >> veteran3 ) ) >> ????????? trt??????? prior??????? karno >> ? 0.180197215 -0.005550913 -0.033771016 >> >> The difference is not large, but definitely more than just a rounding >> error, or something like that. >> >> What's going on? How can the results get wrong, especially by >> including more cutpoints? > > All results are wrong, but they are useful (paraphrasing George EP Box).That said, it is a little surprising: The generated risk sets are (should be) identical in all cases, and one would expect rounding errors to be the same. But data get stored differently, and ... who knows? I tried your examples on my computer and got exactly the same results as you. Which surprised me. G,> > G?ran > >> >> Thank you in advance, >> Tamas >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.