On Fri, 10 Jun 2005, Hanke, Alex wrote:
> Dear All,
> I'm having just a little terminology problem, relating the language
used in
> the Hosmer and Lemeshow text on Applied Survival Analysis to that of the
> help that comes with the survival package.
>
> I am trying to back out the values for the baseline hazard, h_o(t_i), for
> each event time or observation time.
> Now survfit(fit)$surv gives me the value of the survival function,
> S(t_i|X_i,B), using mean values of the covariates and the coxph() object
> provides me with the estimate of the linear predictors, exp(X'B).
> If S(t_i|X_i,B)=S_o(t_i)^exp(X_iB) is the expression for the survival
> function
> And
> -ln(S_o(t_i) ) is the expression for the cumulative baseline hazard
> function, H_o(t_i)
> Then by rearranging the expression for the survival function I get the
> following:
> -ln(S_o(t_i) ) = -ln( S(t_i|X_i,B) ) / exp(X_iB)
> = basehaz(fit)/exp(fit$linear.predictors)
> Am I right so far and is there an easier way?
No, and yes.
You are dividing the centered baseline hazard at each time point by the
linear predictor for the person who happened to die at that time, rather
than the linear predictor at the mean covariates.
basehaz(fit, centered=FALSE) will get you the baseline hazard at zero
covariates.
You don't even need that. The baseline hazard at zero covariates is
constant if and only if the centered baseline hazard is constant, so you
could also work with basehaz(fit), which is often more numerically stable.
> The plot of the cumulative baseline hazard function , H_o(t_i), should be
> linear across time. Once I have, H_o(t_i), to get at h_o(t_i) I then need
> to reverse the cumsum operation. The corresponding plot should have a
> constant baseline hazard over time.
No. Not at all.
Unless you smooth the h_0(t_i) they are completely useless for what you
want.
Suppose the hazard rate is constant and you have no covariates in the
model and not even any censoring. In that case the increments of the
baseline hazard are 1/n, 1/(n-1), 1/(n-2),..., 1/2, 1, where n is the
sample size. So in this simplest possible cause a constant baseline
hazard rate leads to h_0(t_i) increasing with t.
The proper smoothing is a little tricky, because the failure distribution
is skewed and has a boundary at zero, and because of censoring. That's
why textbooks often recommend graphing the cumulative hazard to see if it
is linear rather than the increments in the cumulative hazard to see if
they are constant.
-thomas