Hi all, I am looking at the tutorial/appendix from John Fox on ?Cox Proportional-Hazards Regression for Survival Data? available here: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-cox-regression.pdf I am particularly interested in modelling survival with time-dependent covariates (Section 4). The data look like this:> Rossi.2[1:50,]start stop arrest.time week arrest fin age race wexp mar paro prio educ employed 0 1 0 20 1 0 27 1 0 0 1 3 3 0 1 2 0 20 1 0 27 1 0 0 1 3 3 0 ... 18 19 0 20 1 0 27 1 0 0 1 3 3 0 19 20 1 20 1 0 27 1 0 0 1 3 3 0 0 1 0 17 1 0 18 1 0 0 1 8 4 0 1 2 0 17 1 0 18 1 0 0 1 8 4 0 ... 15 16 0 17 1 0 18 1 0 0 1 8 4 0 16 17 1 17 1 0 18 1 0 0 1 8 4 0 0 1 0 25 1 0 19 0 1 0 1 13 3 0 1 2 0 25 1 0 19 0 1 0 1 13 3 0 ... 3.13 12 13 0 25 1 0 19 0 1 0 1 13 3 0 John suggests the following model: mod.allison.2 <- coxph(Surv(start, stop, arrest.time) ~ + fin + age + race + wexp + mar + paro + prio + employed, + data=Rossi.2) 1-Would informing the algorithm coxph which samples represents the same person (through the use of an Id for example) improve the ?efficiency? of the estimated model? And if so, how should i do that? Using strata()? 2- He later suggests ?accommodating non-proportional hazards by building interactions between covariates and time into the Cox regression model? as follows: mod.allison.5 <- coxph(Surv(start, stop, arrest.time) ~ + fin + age + age:stop + prio, + data=Rossi.2) I have read quite a lot of documentation to understand the meaning of ?age + age:stop? in the formula, but I am unsure of what it means. If I wanted to visualise these variables which are entering the model, would it be something like: data.frame(Rossi.2$age,Rossi.2$age %in% Rossi.2$stop) I hope this make sense. Thanks for your help, Ben
Terry Therneau
2010-Jul-02 13:25 UTC
[R] Modelling survival with time-dependent covariates
1-Would informing the algorithm coxph which samples represents the same person (through the use of an Id for example) improve the ?efficiency? of the estimated model? And if so, how should i do that? Using strata()? No, it makes no change. The reason is that the (start, stop] is just a trick. At each death time the program needs to figure out what the covariates are for everyone else at that time; the start,stop lets it pick the right line for each subject. As long as there are no overlaps, i.e. (0,20], (15, 50], then there is only one copy of the person, and no 'correlated data' issue. (Overlap is wierd -- it corresponds to two copies of me being in the room at the same time.) If there are multiple events for a subject, then there is correlation (via a different mechanism), and addition of a cluster() term is needed. 2- He later suggests ?accommodating non-proportional hazards by building interactions between covariates and time into the Cox regression model? as follows: coxph(Surv(start, stop, arrest.time) ~fin + age + age:stop + prio, ... This trick ONLY works if a. the data set has been artificially divided (as your example has) into small uniform time increments, the same for each subject. b. the form of the non-ph is actally a linear change in beta over time. Use cox.zph on the original model to look at this. When I see non-ph (the plot from cox.zph is not horizontal) life is rarely so simple. Terry Therneau
Maybe Matching Threads
- Categorical variables and Plotting a Cox model with interaction terms
- How to specify "newdata" in a Cox-Modell with a time dependent interaction term?
- Help with time varying covariate-unfold function
- Using coxph with Gompertz-distributed survival data.
- PH Model assumption