Paul Miller
2011-Apr-01 02:18 UTC
[R] Cox Proportional Hazards model with a time-varying covariate
Hello Everyone, I'm learning how to perform various statistical analyses in R. I'm checking my understanding by replicating examples from my SAS books. Below is an attempt to replicate a Cox Proportional Hazards model with a time-varying covariate. I think I'm doing this correctly but am not completely sure. I would appreciate it if someone could double-check my results. In case people find it helpful, I've pasted the SAS code below as well. The R results I'm getting are similar to the SAS results but not very close. For example, my coefficient for TRT is .2913 in R but SAS gives .3450. I'd like to be able to run the R Code with method = "exact" to make it as same as the AS example but I can't seem to get it to work. R always says it's "not responding." Is there any chance it would run if I just waited long enough? And is there a better way I could do the analysis? I was thinking maybe I could do the analysis without restructuring my data, but I haven't been able to find anything like that. Thanks, Paul R Code: #### Example 22.2: Hyalurise in Vitreous Hemorrhage #### connection <- textConnection(" 101 32 1 A 3 . 102 20 1 A 4 10 103 -52 0 A 3 . 104 4 0 A 4 . 105 24 1 A 4 52 106 41 1 A 3 . 107 32 0 A 4 12 108 32 1 A 4 . 109 25 0 A 3 . 110 -10 1 A 3 . 111 8 0 A 3 . 112 32 1 A 4 . 113 -52 0 A 4 38 114 45 1 A 3 36 115 -14 0 A 4 . 116 6 1 A 3 . 117 -52 1 A 4 . 118 9 0 A 4 28 119 13 1 A 3 . 120 36 0 A 4 . 121 6 1 A 4 . 122 -42 0 A 4 . 123 44 1 A 3 28 124 -52 0 A 3 . 125 23 1 A 4 . 126 30 1 A 3 . 127 16 0 A 3 . 128 2 1 A 3 . 129 21 0 A 3 6 130 26 1 A 4 . 131 8 0 A 3 . 132 46 1 A 4 18 133 21 1 A 4 . 134 14 1 A 3 . 135 -52 0 A 3 . 136 -24 1 A 3 . 137 -22 0 A 3 . 138 -35 0 A 3 . 139 19 1 A 3 . 140 24 1 A 4 20 141 31 1 A 3 12 142 28 0 A 4 . 143 -52 0 A 4 34 144 -52 1 A 3 . 145 35 0 A 3 . 146 40 1 A 4 . 147 4 0 A 3 . 148 7 1 A 4 24 149 -2 0 A 4 . 150 16 1 A 4 . 151 36 1 A 3 4 152 12 0 A 3 . 153 28 1 A 3 . 154 48 0 A 3 42 155 -38 1 A 3 . 156 9 1 A 3 . 157 11 0 A 3 . 158 20 0 A 4 . 159 30 1 A 4 . 160 6 1 A 4 . 201 -15 0 B 3 . 202 -4 1 B 3 . 203 10 1 B 4 . 204 10 0 B 4 . 205 8 0 B 3 . 206 11 1 B 3 . 207 -52 1 B 4 . 208 12 1 B 3 . 209 20 0 B 3 . 210 46 1 B 4 . 211 32 0 B 4 44 212 10 1 B 3 . 213 42 0 B 3 . 214 31 1 B 4 20 215 4 1 B 4 . 216 -24 0 B 3 . 217 20 1 B 3 33 218 16 0 B 3 . 219 2 1 B 3 . 220 25 1 B 4 . 221 11 0 B 4 . 222 -52 0 B 4 . 223 20 1 B 4 . 224 9 1 B 3 . 225 22 1 B 3 . 226 23 1 B 3 . 227 11 0 B 3 . 228 13 1 B 3 . 229 -7 0 B 4 . 230 8 1 B 4 . 231 37 1 B 4 . 232 32 0 B 4 26 233 24 1 B 3 10 234 -52 1 B 4 . 235 20 0 B 4 . 236 34 0 B 4 . 237 27 1 B 3 . 301 10 1 C 3 . 302 -10 0 C 4 8 303 32 1 C 4 . 304 14 1 C 4 20 305 24 0 C 3 . 306 14 1 C 4 . 307 36 0 C 3 . 308 -52 0 C 3 . 309 7 1 C 3 . 310 6 0 C 4 . 311 5 1 C 3 . 312 9 1 C 3 . 313 12 1 C 4 . 314 21 0 C 3 . 315 10 1 C 3 . 316 26 0 C 4 36 317 -52 1 C 4 . 318 10 1 C 3 . 319 27 0 C 4 12 320 5 1 C 3 . 321 34 1 C 3 16 322 14 0 C 4 . 323 6 1 C 3 . 324 22 1 C 4 . 325 45 0 C 4 12 326 16 0 C 3 . 327 15 1 C 3 30 328 33 0 C 4 22 329 20 1 C 3 . 330 -52 1 C 3 30 331 5 0 C 4 . 332 10 1 C 3 . 333 16 0 C 3 . 334 -2 1 C 4 . 335 17 0 C 4 . 336 16 0 C 4 . 337 16 1 C 3 . 338 33 1 C 3 20 339 24 0 C 3 11 340 12 1 C 3 . 341 27 1 C 4 18 342 2 1 C 4 . 343 -6 0 C 4 . 344 -20 1 C 4 . 345 14 0 C 3 . 346 3 1 C 3 . ") vitclear <- data.frame(scan(connection, na.strings=".", list(PAT=0, RSPTIM=0, TRT=0, CENTER="", DENS=0, INFTIM=0))) #RSPTIM = time (wks) from randomization to response (censored if negative) #TRT = 1 for Hyalurise, TRT = 0 for Saline #CENTER = study center (A, B, or C) #DENS = 3, 4 for Grade 3 or 4, respectively #INFTIM = time (wks) from randomization to onset of infection complications vitclear$STATUS <- with(vitclear, ifelse(RSPTIM > 0, 1, 0)) vitclear$RSPTIM <- with(vitclear, abs(RSPTIM)) vitclear$CNT <- with(vitclear, ifelse(!is.na(INFTIM) & INFTIM < RSPTIM, 2, 1)) vitclear <- data.frame(lapply(vitclear, function(x) rep(x, vitclear$CNT))) vitclear$CNT <- NULL vitclear <- transform(vitclear, START = ifelse(!duplicated(PAT), 0, INFTIM), STOP = ifelse(!duplicated(PAT, fromLast=T), RSPTIM, INFTIM), INFCTN = ifelse(INFTIM > RSPTIM | is.na(INFTIM), 0, 1)) head(vitclear, 10) library("survival") model <- coxph(Surv(START, STOP, STATUS) ~ TRT + DENS + INFCTN, data = vitclear) summary(model) SAS CODE: /*Example 22.2 (Hyalurise in Vitreous Hemorrhage) */ DATA VITCLEAR; INPUT PAT RSPTIM TRT CENTER $ DENS INFTIM @@; CENS = (RSPTIM GE 0); RSPTIM = ABS(RSPTIM); DATALINES; 101 32 1 A 3 . 102 20 1 A 4 10 103 -52 0 A 3 . 104 4 0 A 4 . 105 24 1 A 4 52 106 41 1 A 3 . 107 32 0 A 4 12 108 32 1 A 4 . 109 25 0 A 3 . 110 -10 1 A 3 . 111 8 0 A 3 . 112 32 1 A 4 . 113 -52 0 A 4 38 114 45 1 A 3 36 115 -14 0 A 4 . 116 6 1 A 3 . 117 -52 1 A 4 . 118 9 0 A 4 28 119 13 1 A 3 . 120 36 0 A 4 . 121 6 1 A 4 . 122 -42 0 A 4 . 123 44 1 A 3 28 124 -52 0 A 3 . 125 23 1 A 4 . 126 30 1 A 3 . 127 16 0 A 3 . 128 2 1 A 3 . 129 21 0 A 3 6 130 26 1 A 4 . 131 8 0 A 3 . 132 46 1 A 4 18 133 21 1 A 4 . 134 14 1 A 3 . 135 -52 0 A 3 . 136 -24 1 A 3 . 137 -22 0 A 3 . 138 -35 0 A 3 . 139 19 1 A 3 . 140 24 1 A 4 20 141 31 1 A 3 12 142 28 0 A 4 . 143 -52 0 A 4 34 144 -52 1 A 3 . 145 35 0 A 3 . 146 40 1 A 4 . 147 4 0 A 3 . 148 7 1 A 4 24 149 -2 0 A 4 . 150 16 1 A 4 . 151 36 1 A 3 4 152 12 0 A 3 . 153 28 1 A 3 . 154 48 0 A 3 42 155 -38 1 A 3 . 156 9 1 A 3 . 157 11 0 A 3 . 158 20 0 A 4 . 159 30 1 A 4 . 160 6 1 A 4 . 201 -15 0 B 3 . 202 -4 1 B 3 . 203 10 1 B 4 . 204 10 0 B 4 . 205 8 0 B 3 . 206 11 1 B 3 . 207 -52 1 B 4 . 208 12 1 B 3 . 209 20 0 B 3 . 210 46 1 B 4 . 211 32 0 B 4 44 212 10 1 B 3 . 213 42 0 B 3 . 214 31 1 B 4 20 215 4 1 B 4 . 216 -24 0 B 3 . 217 20 1 B 3 33 218 16 0 B 3 . 219 2 1 B 3 . 220 25 1 B 4 . 221 11 0 B 4 . 222 -52 0 B 4 . 223 20 1 B 4 . 224 9 1 B 3 . 225 22 1 B 3 . 226 23 1 B 3 . 227 11 0 B 3 . 228 13 1 B 3 . 229 -7 0 B 4 . 230 8 1 B 4 . 231 37 1 B 4 . 232 32 0 B 4 26 233 24 1 B 3 10 234 -52 1 B 4 . 235 20 0 B 4 . 236 34 0 B 4 . 237 27 1 B 3 . 301 10 1 C 3 . 302 -10 0 C 4 8 303 32 1 C 4 . 304 14 1 C 4 20 305 24 0 C 3 . 306 14 1 C 4 . 307 36 0 C 3 . 308 -52 0 C 3 . 309 7 1 C 3 . 310 6 0 C 4 . 311 5 1 C 3 . 312 9 1 C 3 . 313 12 1 C 4 . 314 21 0 C 3 . 315 10 1 C 3 . 316 26 0 C 4 36 317 -52 1 C 4 . 318 10 1 C 3 . 319 27 0 C 4 12 320 5 1 C 3 . 321 34 1 C 3 16 322 14 0 C 4 . 323 6 1 C 3 . 324 22 1 C 4 . 325 45 0 C 4 12 326 16 0 C 3 . 327 15 1 C 3 30 328 33 0 C 4 22 329 20 1 C 3 . 330 -52 1 C 3 30 331 5 0 C 4 . 332 10 1 C 3 . 333 16 0 C 3 . 334 -2 1 C 4 . 335 17 0 C 4 . 336 16 0 C 4 . 337 16 1 C 3 . 338 33 1 C 3 20 339 24 0 C 3 11 340 12 1 C 3 . 341 27 1 C 4 18 342 2 1 C 4 . 343 -6 0 C 4 . 344 -20 1 C 4 . 345 14 0 C 3 . 346 3 1 C 3 . ; PROC PHREG DATA = VITCLEAR; MODEL RSPTIM*CENS(0) = TRT DENS INFCTN / TIES = EXACT; IF INFTIM GT RSPTIM OR INFTIM = . THEN INFCTN = 0; ELSE INFCTN = 1; STRATA CENTER; TITLE1 'Cox Proportional Hazards Model'; TITLE2 'Example 22.2: Hyalurise in Vitreous Hemorrhage' ; RUN; [[alternative HTML version deleted]]
Terry Therneau
2011-Apr-01 11:49 UTC
[R] Cox Proportional Hazards model with a time-varying covariate
> The R results I'm getting are similar to the SAS results but not very > close. For example, my coefficient for TRT is .2913 in R but SAS > gives .3450. I'd like to be able to run the R Code with method > "exact" to make it as same as the AS example but I can't seem to get > it to workYou need to read the documentation more carefully. The "exact partial likelihood" (Cox's label, not mine), which is appropriate for data on a discreete time scale, is invoked by the "exact" option for ties in R and by using the "discrete" option in SAS. (In hindsight I actually like their choice of a label somewhat better, since users more often make the correct association of method to data.) This method can take nearly forever to compute, however, when there are a lot of subjects tied at a given time. For instance at time 20 your data set has 10 events out of 77 subjects; the term in the exact partial likelihood requires a sum over all the possible subsets of 10 chosen from 77, approximately 10^12 terms: you might retire before it finishes. The "exact marginal likelihood" proposed by Prentice is invoked in SAS using ties=exact, it is not implimented in R. I have never found any compelling reason to program it. Statistical comment: in statistics an "exact" method means that the solution can be computed exactly, the label has absolutely no connection with the question of whether the method is a sensible thing to do. In the case of a Cox model, I prefer and strongly recommend using the Efron approximation for ties, this is the default in R. The Breslow approximation is usually quite good enough, this is the default in SAS. Terry Therneau
Paul Miller
2011-Apr-01 14:28 UTC
[R] Cox Proportional Hazards model with a time-varying covariate
Hi Terry and Kevin, Thanks for your replies to my post. And thanks particularly to Terry for pointing out my confusion about the "exact partial likelihood" vs. the "exact marginal likelihood." I'm learning and so am likely to be confused about such things from time to time. I had thought that a good way to see if I was replicating the results would be to use the Efron approximation in both R and SAS. I tried this just now and obtained a similar pattern of differences in the results. Please bear with me. I hope I'm not doing something silly. Below are my results in each program. I got the results for R simply by running the code I sent earlier. I got the results in SAS by changing TIES = EXACT to TIES = EFRON. Should I be getting results as different as these? Or does this suggest that there is some problem in how I've set up my data in R? I've looked over my data In R and everything looks fine, but maybe I've missed something. Thanks, Paul> summary(model)Call: coxph(formula = Surv(START, STOP, STATUS) ~ TRT + DENS + INFCTN, data = vitclear) n= 167 coef exp(coef) se(coef) z Pr(>|z|) TRT 0.2913 1.3382 0.1782 1.635 0.102 DENS -0.1762 0.8384 0.1739 -1.013 0.311 INFCTN 0.1192 1.1266 0.1912 0.624 0.533 exp(coef) exp(-coef) lower .95 upper .95 TRT 1.3382 0.7473 0.9436 1.898 DENS 0.8384 1.1927 0.5963 1.179 INFCTN 1.1266 0.8876 0.7746 1.639 Rsquare= 0.024 (max possible= 0.999 ) Likelihood ratio test= 4.12 on 3 df, p=0.2486 Wald test = 4.12 on 3 df, p=0.2485 Score (logrank) test = 4.15 on 3 df, p=0.2460 Cox Proportional Hazards Model 09:01 Friday, April 1, 2011 1 Example 22.2: Hyalurise in Vitreous Hemorrhage The PHREG Procedure Model Information Data Set WORK.VITCLEAR Dependent Variable RSPTIM Censoring Variable CENS Censoring Value(s) 0 Ties Handling EFRON Number of Observations Read 143 Number of Observations Used 143 Summary of the Number of Event and Censored Values Percent Stratum CENTER Total Event Censored Censored 1 A 60 45 15 25.00 2 B 37 30 7 18.92 3 C 46 39 7 15.22 ------------------------------------------------------------------- Total 143 114 29 20.28 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Cox Proportional Hazards Model 09:01 Friday, April 1, 2011 2 Example 22.2: Hyalurise in Vitreous Hemorrhage The PHREG Procedure Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 711.142 706.171 AIC 711.142 712.171 SBC 711.142 720.379 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 4.9712 3 0.1739 Score 4.9604 3 0.1747 Wald 4.9151 3 0.1781 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio TRT 1 0.34437 0.19562 3.0988 0.0783 1.411 DENS 1 -0.23832 0.19585 1.4807 0.2237 0.788 INFCTN 1 0.12819 0.27780 0.2129 0.6445 1.137 [[alternative HTML version deleted]]