Dear List: I have a data frame prepared in the couting process style for including a binary time-dependent covariate. The first few rows look like this. PtNo Start End Status Imp 1 1 0 608.0 0 0 2 2 0 513.0 0 0 3 2 513 887.0 0 1 4 3 0 57.0 0 0 5 3 57 604.0 0 1 6 4 0 150.0 1 0 The outcome is mortality and the covariate is for an implantable defibrillator, so it is expected that the implant would reduce the risk of death. The results of fitting coxph (survival package) are: Call: coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl) coef exp(coef) se(coef) z p Imp 0.163 1.18 0.485 0.337 0.74 Likelihood ratio test=0.11 on 1 df, p=0.738 n= 335 Since this was unexpected, I created a non-counting process data frame with an indicator variable representing received an implant or not. Here are the results: Call: coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0) coef exp(coef) se(coef) z p Implant -1.77 0.171 0.426 -4.15 3.3e-05 Likelihood ratio test=19.1 on 1 df, p=1.21e-05 n= 197 I found this degree of discrepancy surprising, especially the point estimate of the coefficient. I have verified the data frames are set up correctly. Here is what I have tried to understand what is going on. I tried fitting models adjusted for other covariates that I have in the data frame. This did not appreciably affect the coefficients for the implant variable. I ran cox.zph on the two models shown above and plotted the results. In both cases, the point estimate of Beta(t) is sort of parabolic in that the curves are monotonically increasing to a local maximum after which they are monotonically decreasing (the CIs are a bit more wiggly). I would interpret this to mean that the effect of implant is probably time-dependent. If so, how do I actually get a "proper" estimate of beta(t) for a variable like this? Are there some other things I should look at to understand what's going on? Here is my sessionInfo. R version 2.5.0 (2007-04-23) i686-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C attached base packages: [1] "splines" "stats" "graphics" "grDevices" "utils" "datasets" [7] "methods" "base" other attached packages: cmprsk survival "2.1-7" "2.31" -- Kevin E. Thorpe Biostatistician/Trialist, Knowledge Translation Program Assistant Professor, Department of Public Health Sciences Faculty of Medicine, University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.6057
Kevin E. Thorpe wrote:> Dear List: > > I have a data frame prepared in the couting process style for including > a binary time-dependent covariate. The first few rows look like this. > > PtNo Start End Status Imp > 1 1 0 608.0 0 0 > 2 2 0 513.0 0 0 > 3 2 513 887.0 0 1 > 4 3 0 57.0 0 0 > 5 3 57 604.0 0 1 > 6 4 0 150.0 1 0 > > > The outcome is mortality and the covariate is for an implantable > defibrillator, so it is expected that the implant would reduce the > risk of death. The results of fitting coxph (survival package) are: > > Call: > coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl) > > > coef exp(coef) se(coef) z p > Imp 0.163 1.18 0.485 0.337 0.74 > > Likelihood ratio test=0.11 on 1 df, p=0.738 n= 335 > > Since this was unexpected, I created a non-counting process data > frame with an indicator variable representing received an implant > or not. Here are the results: > > Call: > coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0) > > > coef exp(coef) se(coef) z p > Implant -1.77 0.171 0.426 -4.15 3.3e-05 > > Likelihood ratio test=19.1 on 1 df, p=1.21e-05 n= 197 > > I found this degree of discrepancy surprising, especially the point > estimate of the coefficient. I have verified the data frames are > set up correctly. > > Here is what I have tried to understand what is going on. > > I tried fitting models adjusted for other covariates that I have in > the data frame. This did not appreciably affect the coefficients > for the implant variable. > > I ran cox.zph on the two models shown above and plotted the results. > In both cases, the point estimate of Beta(t) is sort of parabolic > in that the curves are monotonically increasing to a local maximum > after which they are monotonically decreasing (the CIs are a bit > more wiggly). > > I would interpret this to mean that the effect of implant is probably > time-dependent. If so, how do I actually get a "proper" estimate of > beta(t) for a variable like this? > > Are there some other things I should look at to understand what's > going on? > >If you want to play with time-dependent regression coefficients have a look at the timereg package and the book that it supports. However, first you need to consider the possibility of selection effects that can take place even with non-varying effects. In the case at hand I would suspect a bias created by the fact that you don't implant devices into people who are already dead.> Here is my sessionInfo. > R version 2.5.0 (2007-04-23) > i686-pc-linux-gnu > > locale: > LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C > > attached base packages: > [1] "splines" "stats" "graphics" "grDevices" "utils" "datasets" > [7] "methods" "base" > > other attached packages: > cmprsk survival > "2.1-7" "2.31" > > >-- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
From my experience, what you are seeing is almost certainly a patient selection effect. (The number 1 reason for puzzling results is incorrect coding of a time-dependent covariate, but you appear to have been quite careful). Assigning the implant as a non-time dependent covariate almost guarrantees that the estimated effect will be beneficial. The only people who get an implant are those who live longer than average (long enough to get an implant). The size of such a bias is surprisingly large. The problem is rediscovered in the cancer field every few years, in comparisons of responders to non-responders. As a time-dependent covariate, you have the problem of indication for treatment. Say for instance that the devices were very expensive, and were only used for patients in immenent danger of death. For a device that was a placebo you would find, not surprisingly, that being selected for implantation carried a major risk. The device may need to be extremely effective to overcome this type of bias. As a simple example, if you compare the death rate of those who have seen a oncologist (cancer doc) in the last month to those who have not done so, you find that the former group has a much higher death rate. Terry Therneau> Kevin E. Thorpe wrote: >> Dear List: >> >> I have a data frame prepared in the couting process style for including >> a binary time-dependent covariate. The first few rows look like this. >> >> PtNo Start End Status Imp >> 1 1 0 608.0 0 0 >> 2 2 0 513.0 0 0 >> 3 2 513 887.0 0 1 >> 4 3 0 57.0 0 0 >> 5 3 57 604.0 0 1 >> 6 4 0 150.0 1 0 >> >> >> The outcome is mortality and the covariate is for an implantable >> defibrillator, so it is expected that the implant would reduce the >> risk of death. The results of fitting coxph (survival package) are: >> >> Call: >> coxph(formula = Surv(Start, End, Status) ~ Imp, data = nina.excl) >> >> >> coef exp(coef) se(coef) z p >> Imp 0.163 1.18 0.485 0.337 0.74 >> >> Likelihood ratio test=0.11 on 1 df, p=0.738 n= 335 >> >> Since this was unexpected, I created a non-counting process data >> frame with an indicator variable representing received an implant >> or not. Here are the results: >> >> Call: >> coxph(formula = Surv(Days, Dead) ~ Implant, data = nina.excl0) >> >> >> coef exp(coef) se(coef) z p >> Implant -1.77 0.171 0.426 -4.15 3.3e-05 >> >> Likelihood ratio test=19.1 on 1 df, p=1.21e-05 n= 197 >> >> I found this degree of discrepancy surprising, especially the point >> estimate of the coefficient. I have verified the data frames are >> set up correctly. >> >> Here is what I have tried to understand what is going on. >> >> I tried fitting models adjusted for other covariates that I have in >> the data frame. This did not appreciably affect the coefficients >> for the implant variable. >> >> I ran cox.zph on the two models shown above and plotted the results. >> In both cases, the point estimate of Beta(t) is sort of parabolic >> in that the curves are monotonically increasing to a local maximum >> after which they are monotonically decreasing (the CIs are a bit >> more wiggly). >> >> I would interpret this to mean that the effect of implant is probably >> time-dependent. If so, how do I actually get a "proper" estimate of >> beta(t) for a variable like this? >> >> Are there some other things I should look at to understand what's >> going on?
> I thought about this some more, and I'm not sure that possibility is > "to blame." In my time-dependent model, I don't think I'm doing > anything different than is done for transplant in the Stanford > Heart Study (the often used example for this kind of time-dependent > covariate). As in my case, you would not transplant a dead patient. > So, I remain puzzled as to why my model is misbehaving.The Stanford Heart Study, quoted in nearly every survival book as you say, is a bit of an anomaly. At the time it was run a good tissue match between the donor heart and the recipient was considered very important. When a donor became available, the best match (or near best) among those waiting was chosen to recieve it. Since the donor genetics are unpredictable, this is essentially equal to a random pick from those waiting. The Stanford study is nearly alone in examples of time-dependent treatment in not having selection effects. Terry T.