Paul Miller
2011-Apr-01  02:18 UTC
[R] Cox Proportional Hazards model with a time-varying covariate
Hello Everyone,
 
I'm learning how to perform various statistical analyses in R. I'm
checking my understanding by replicating examples from my SAS books. Below is an
attempt to replicate a Cox Proportional Hazards model with a time-varying
covariate. I think I'm doing this correctly but am not completely sure. I
would appreciate it if someone could double-check my results. In case people
find it helpful, I've pasted the SAS code below as well.  
 
The R results I'm getting are similar to the SAS results but not very close.
For example, my coefficient for TRT is .2913 in R but SAS gives .3450. I'd
like to be able to run the R Code with method  = "exact" to make it as
same as the AS example but I can't seem to get it to work. R always says
it's "not responding." Is there any chance it would run if I just
waited long enough? And is there a better way I could do the analysis? I was
thinking maybe I could do the analysis without restructuring my data, but I
haven't been able to find anything like that. 
 
Thanks,
 
Paul
 
 
R Code:
#### Example 22.2: Hyalurise in Vitreous Hemorrhage ####
 
connection <- textConnection("
101  32 1 A 3  .   102  20 1 A 4 10   103 -52 0 A 3  .
104   4 0 A 4  .   105  24 1 A 4 52   106  41 1 A 3  .
107  32 0 A 4 12   108  32 1 A 4  .   109  25 0 A 3  .
110 -10 1 A 3  .   111   8 0 A 3  .   112  32 1 A 4  .
113 -52 0 A 4 38   114  45 1 A 3 36   115 -14 0 A 4  .
116   6 1 A 3  .   117 -52 1 A 4  .   118   9 0 A 4 28
119  13 1 A 3  .   120  36 0 A 4  .   121   6 1 A 4  .
122 -42 0 A 4  .   123  44 1 A 3 28   124 -52 0 A 3  .
125  23 1 A 4  .   126  30 1 A 3  .   127  16 0 A 3  .
128   2 1 A 3  .   129  21 0 A 3  6   130  26 1 A 4  .
131   8 0 A 3  .   132  46 1 A 4 18   133  21 1 A 4  .
134  14 1 A 3  .   135 -52 0 A 3  .   136 -24 1 A 3  .
137 -22 0 A 3  .   138 -35 0 A 3  .   139  19 1 A 3  .
140  24 1 A 4 20   141  31 1 A 3 12   142  28 0 A 4  .
143 -52 0 A 4 34   144 -52 1 A 3  .   145  35 0 A 3  .
146  40 1 A 4  .   147   4 0 A 3  .   148   7 1 A 4 24
149  -2 0 A 4  .   150  16 1 A 4  .   151  36 1 A 3  4
152  12 0 A 3  .   153  28 1 A 3  .   154  48 0 A 3 42
155 -38 1 A 3  .   156   9 1 A 3  .   157  11 0 A 3  .
158  20 0 A 4  .   159  30 1 A 4  .   160   6 1 A 4  .
201 -15 0 B 3  .   202  -4 1 B 3  .   203  10 1 B 4  .
204  10 0 B 4  .   205   8 0 B 3  .   206  11 1 B 3  .
207 -52 1 B 4  .   208  12 1 B 3  .   209  20 0 B 3  .
210  46 1 B 4  .   211  32 0 B 4 44   212  10 1 B 3  .
213  42 0 B 3  .   214  31 1 B 4 20   215   4 1 B 4  .
216 -24 0 B 3  .   217  20 1 B 3 33   218  16 0 B 3  .
219   2 1 B 3  .   220  25 1 B 4  .   221  11 0 B 4  .
222 -52 0 B 4  .   223  20 1 B 4  .   224   9 1 B 3  .
225  22 1 B 3  .   226  23 1 B 3  .   227  11 0 B 3  .
228  13 1 B 3  .   229  -7 0 B 4  .   230   8 1 B 4  .
231  37 1 B 4  .   232  32 0 B 4 26   233  24 1 B 3 10
234 -52 1 B 4  .   235  20 0 B 4  .   236  34 0 B 4  .
237  27 1 B 3  .   301  10 1 C 3  .   302 -10 0 C 4  8
303  32 1 C 4  .   304  14 1 C 4 20   305  24 0 C 3  .
306  14 1 C 4  .   307  36 0 C 3  .   308 -52 0 C 3  .
309   7 1 C 3  .   310   6 0 C 4  .   311   5 1 C 3  .
312   9 1 C 3  .   313  12 1 C 4  .   314  21 0 C 3  .
315  10 1 C 3  .   316  26 0 C 4 36   317 -52 1 C 4  .
318  10 1 C 3  .   319  27 0 C 4 12   320   5 1 C 3  .
321  34 1 C 3 16   322  14 0 C 4  .   323   6 1 C 3  .
324  22 1 C 4  .   325  45 0 C 4 12   326  16 0 C 3  .
327  15 1 C 3 30   328  33 0 C 4 22   329  20 1 C 3  .
330 -52 1 C 3 30   331   5 0 C 4  .   332  10 1 C 3  .
333  16 0 C 3  .   334  -2 1 C 4  .   335  17 0 C 4  .
336  16 0 C 4  .   337  16 1 C 3  .   338  33 1 C 3 20
339  24 0 C 3 11   340  12 1 C 3  .   341  27 1 C 4 18
342   2 1 C 4  .   343  -6 0 C 4  .   344 -20 1 C 4  .
345  14 0 C 3  .   346   3 1 C 3  .
")
 
vitclear <- data.frame(scan(connection, na.strings=".",
  list(PAT=0, RSPTIM=0, TRT=0, CENTER="", DENS=0, INFTIM=0)))
 
#RSPTIM = time (wks) from randomization to response (censored if negative)
#TRT    = 1 for Hyalurise, TRT = 0 for Saline
#CENTER = study center (A, B, or C)
#DENS   = 3, 4 for Grade 3 or 4, respectively
#INFTIM = time (wks) from randomization to onset of infection complications
 
vitclear$STATUS <- with(vitclear, ifelse(RSPTIM > 0, 1, 0))
vitclear$RSPTIM <- with(vitclear, abs(RSPTIM))
vitclear$CNT <- with(vitclear, ifelse(!is.na(INFTIM) & INFTIM <
RSPTIM, 2, 1))
vitclear <- data.frame(lapply(vitclear, function(x) rep(x, vitclear$CNT)))
vitclear$CNT <- NULL
 
vitclear <- transform(vitclear,
    START = ifelse(!duplicated(PAT), 0, INFTIM),  
    STOP  = ifelse(!duplicated(PAT, fromLast=T), RSPTIM, INFTIM),
    INFCTN = ifelse(INFTIM > RSPTIM | is.na(INFTIM), 0, 1))
 
head(vitclear, 10)
 
library("survival")
model <- coxph(Surv(START, STOP, STATUS) ~ TRT + DENS + INFCTN, data =
vitclear)
summary(model)
 
SAS CODE:
 
/*Example 22.2 (Hyalurise in Vitreous Hemorrhage) */
 
DATA VITCLEAR;
    INPUT PAT RSPTIM TRT CENTER $ DENS INFTIM @@;
    CENS   = (RSPTIM GE 0);
    RSPTIM = ABS(RSPTIM);
    DATALINES;
101  32 1 A 3  .   102  20 1 A 4 10   103 -52 0 A 3  .
104   4 0 A 4  .   105  24 1 A 4 52   106  41 1 A 3  .
107  32 0 A 4 12   108  32 1 A 4  .   109  25 0 A 3  .
110 -10 1 A 3  .   111   8 0 A 3  .   112  32 1 A 4  .
113 -52 0 A 4 38   114  45 1 A 3 36   115 -14 0 A 4  .
116   6 1 A 3  .   117 -52 1 A 4  .   118   9 0 A 4 28
119  13 1 A 3  .   120  36 0 A 4  .   121   6 1 A 4  .
122 -42 0 A 4  .   123  44 1 A 3 28   124 -52 0 A 3  .
125  23 1 A 4  .   126  30 1 A 3  .   127  16 0 A 3  .
128   2 1 A 3  .   129  21 0 A 3  6   130  26 1 A 4  .
131   8 0 A 3  .   132  46 1 A 4 18   133  21 1 A 4  .
134  14 1 A 3  .   135 -52 0 A 3  .   136 -24 1 A 3  .
137 -22 0 A 3  .   138 -35 0 A 3  .   139  19 1 A 3  .
140  24 1 A 4 20   141  31 1 A 3 12   142  28 0 A 4  .
143 -52 0 A 4 34   144 -52 1 A 3  .   145  35 0 A 3  .
146  40 1 A 4  .   147   4 0 A 3  .   148   7 1 A 4 24
149  -2 0 A 4  .   150  16 1 A 4  .   151  36 1 A 3  4
152  12 0 A 3  .   153  28 1 A 3  .   154  48 0 A 3 42
155 -38 1 A 3  .   156   9 1 A 3  .   157  11 0 A 3  .
158  20 0 A 4  .   159  30 1 A 4  .   160   6 1 A 4  .
201 -15 0 B 3  .   202  -4 1 B 3  .   203  10 1 B 4  .
204  10 0 B 4  .   205   8 0 B 3  .   206  11 1 B 3  .
207 -52 1 B 4  .   208  12 1 B 3  .   209  20 0 B 3  .
210  46 1 B 4  .   211  32 0 B 4 44   212  10 1 B 3  .
213  42 0 B 3  .   214  31 1 B 4 20   215   4 1 B 4  .
216 -24 0 B 3  .   217  20 1 B 3 33   218  16 0 B 3  .
219   2 1 B 3  .   220  25 1 B 4  .   221  11 0 B 4  .
222 -52 0 B 4  .   223  20 1 B 4  .   224   9 1 B 3  .
225  22 1 B 3  .   226  23 1 B 3  .   227  11 0 B 3  .
228  13 1 B 3  .   229  -7 0 B 4  .   230   8 1 B 4  .
231  37 1 B 4  .   232  32 0 B 4 26   233  24 1 B 3 10
234 -52 1 B 4  .   235  20 0 B 4  .   236  34 0 B 4  .
237  27 1 B 3  .   301  10 1 C 3  .   302 -10 0 C 4  8
303  32 1 C 4  .   304  14 1 C 4 20   305  24 0 C 3  .
306  14 1 C 4  .   307  36 0 C 3  .   308 -52 0 C 3  .
309   7 1 C 3  .   310   6 0 C 4  .   311   5 1 C 3  .
312   9 1 C 3  .   313  12 1 C 4  .   314  21 0 C 3  .
315  10 1 C 3  .   316  26 0 C 4 36   317 -52 1 C 4  .
318  10 1 C 3  .   319  27 0 C 4 12   320   5 1 C 3  .
321  34 1 C 3 16   322  14 0 C 4  .   323   6 1 C 3  .
324  22 1 C 4  .   325  45 0 C 4 12   326  16 0 C 3  .
327  15 1 C 3 30   328  33 0 C 4 22   329  20 1 C 3  .
330 -52 1 C 3 30   331   5 0 C 4  .   332  10 1 C 3  .
333  16 0 C 3  .   334  -2 1 C 4  .   335  17 0 C 4  .
336  16 0 C 4  .   337  16 1 C 3  .   338  33 1 C 3 20
339  24 0 C 3 11   340  12 1 C 3  .   341  27 1 C 4 18
342   2 1 C 4  .   343  -6 0 C 4  .   344 -20 1 C 4  .
345  14 0 C 3  .   346   3 1 C 3  .
;
 
PROC PHREG DATA = VITCLEAR;
    MODEL RSPTIM*CENS(0) = TRT DENS INFCTN / TIES = EXACT;
    IF INFTIM GT RSPTIM OR INFTIM = . THEN INFCTN = 0;
    ELSE INFCTN = 1;
    STRATA CENTER;
    TITLE1 'Cox Proportional Hazards Model';
    TITLE2 'Example 22.2: Hyalurise in Vitreous Hemorrhage' ;
RUN;
	[[alternative HTML version deleted]]
Terry Therneau
2011-Apr-01  11:49 UTC
[R] Cox Proportional Hazards model with a time-varying covariate
> The R results I'm getting are similar to the SAS results but not very > close. For example, my coefficient for TRT is .2913 in R but SAS > gives .3450. I'd like to be able to run the R Code with method > "exact" to make it as same as the AS example but I can't seem to get > it to workYou need to read the documentation more carefully. The "exact partial likelihood" (Cox's label, not mine), which is appropriate for data on a discreete time scale, is invoked by the "exact" option for ties in R and by using the "discrete" option in SAS. (In hindsight I actually like their choice of a label somewhat better, since users more often make the correct association of method to data.) This method can take nearly forever to compute, however, when there are a lot of subjects tied at a given time. For instance at time 20 your data set has 10 events out of 77 subjects; the term in the exact partial likelihood requires a sum over all the possible subsets of 10 chosen from 77, approximately 10^12 terms: you might retire before it finishes. The "exact marginal likelihood" proposed by Prentice is invoked in SAS using ties=exact, it is not implimented in R. I have never found any compelling reason to program it. Statistical comment: in statistics an "exact" method means that the solution can be computed exactly, the label has absolutely no connection with the question of whether the method is a sensible thing to do. In the case of a Cox model, I prefer and strongly recommend using the Efron approximation for ties, this is the default in R. The Breslow approximation is usually quite good enough, this is the default in SAS. Terry Therneau
Paul Miller
2011-Apr-01  14:28 UTC
[R] Cox Proportional Hazards model with a time-varying covariate
Hi Terry and Kevin, Thanks for your replies to my post. And thanks particularly to Terry for pointing out my confusion about the "exact partial likelihood" vs. the "exact marginal likelihood." I'm learning and so am likely to be confused about such things from time to time. I had thought that a good way to see if I was replicating the results would be to use the Efron approximation in both R and SAS. I tried this just now and obtained a similar pattern of differences in the results. Please bear with me. I hope I'm not doing something silly. Below are my results in each program. I got the results for R simply by running the code I sent earlier. I got the results in SAS by changing TIES = EXACT to TIES = EFRON. Should I be getting results as different as these? Or does this suggest that there is some problem in how I've set up my data in R? I've looked over my data In R and everything looks fine, but maybe I've missed something. Thanks, Paul> summary(model)Call: coxph(formula = Surv(START, STOP, STATUS) ~ TRT + DENS + INFCTN, data = vitclear) n= 167 coef exp(coef) se(coef) z Pr(>|z|) TRT 0.2913 1.3382 0.1782 1.635 0.102 DENS -0.1762 0.8384 0.1739 -1.013 0.311 INFCTN 0.1192 1.1266 0.1912 0.624 0.533 exp(coef) exp(-coef) lower .95 upper .95 TRT 1.3382 0.7473 0.9436 1.898 DENS 0.8384 1.1927 0.5963 1.179 INFCTN 1.1266 0.8876 0.7746 1.639 Rsquare= 0.024 (max possible= 0.999 ) Likelihood ratio test= 4.12 on 3 df, p=0.2486 Wald test = 4.12 on 3 df, p=0.2485 Score (logrank) test = 4.15 on 3 df, p=0.2460 Cox Proportional Hazards Model 09:01 Friday, April 1, 2011 1 Example 22.2: Hyalurise in Vitreous Hemorrhage The PHREG Procedure Model Information Data Set WORK.VITCLEAR Dependent Variable RSPTIM Censoring Variable CENS Censoring Value(s) 0 Ties Handling EFRON Number of Observations Read 143 Number of Observations Used 143 Summary of the Number of Event and Censored Values Percent Stratum CENTER Total Event Censored Censored 1 A 60 45 15 25.00 2 B 37 30 7 18.92 3 C 46 39 7 15.22 ------------------------------------------------------------------- Total 143 114 29 20.28 Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Cox Proportional Hazards Model 09:01 Friday, April 1, 2011 2 Example 22.2: Hyalurise in Vitreous Hemorrhage The PHREG Procedure Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L 711.142 706.171 AIC 711.142 712.171 SBC 711.142 720.379 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 4.9712 3 0.1739 Score 4.9604 3 0.1747 Wald 4.9151 3 0.1781 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio TRT 1 0.34437 0.19562 3.0988 0.0783 1.411 DENS 1 -0.23832 0.19585 1.4807 0.2237 0.788 INFCTN 1 0.12819 0.27780 0.2129 0.6445 1.137 [[alternative HTML version deleted]]