I would like to be able to construct hazard rates (or unconditional death prob) for many subjects from a given survfit. This will involve adjusting the ( n.event/n.risk) with (coxph object )$linear.predictors I must be having another silly day as I cannot reproduce the linear predictor: fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) fit$linear.predictors[1] [1] 2.612756 coef(fit)*model.matrix(fit)[1,1] age 11.69021 The above is based on the help listing for coxph.object coefficients: the coefficients of the linear predictor, which multiply the columns of the model matrix. If the model is over-determined there will be missing values in the vector corresponding to the redundant columns in the model matrix. Also, please comment whether n.event/n.risk gives the baseline hazard exp(alpha) ? Please, help. Thanks a lot. Stephen B [[alternative HTML version deleted]]
On Oct 27, 2010, at 12:12 PM, Bond, Stephen wrote:> I would like to be able to construct hazard rates (or unconditional > death prob)Hazards are not probabilities (since probabilities are constrained to the range [0,1] and hazards are unbounded upward.)> for many subjects from a given survfit. > > This will involve adjusting the ( n.event/n.risk) > with (coxph object )$linear.predictors > I must be having another silly day as I cannot reproduce the linear > predictor: > > fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) > fit$linear.predictors[1] > [1] 2.612756That's the linear predictor (the beta*X) and that particular number only applies to the first case.> > coef(fit)*model.matrix(fit)[1,1] > age > 11.69021 >I don't know what that might be and you are not telling us what you think it is.> The above is based on the help listing for coxph.object > coefficients: the coefficients of the linear predictor, which multiply > the columns of the model matrix. If the model is > over-determined there will be missing values in the vector > corresponding to the redundant columns in the model matrix. > > Also, please comment whether n.event/n.riskThe Nelson-Aalen estimator of the cumulative hazard as a function of intervals prior to t is sum( n-event(t)/ n.risk(t))> gives the baseline hazard exp(alpha) ?No. The "baseline hazard", as you are calling this, would be an estimate for persons with all covariates = 0, so in this case is for women of age=0. (Not a particularly interpretable result in many situations. The baseline hazard following treated ovarian cancer for neonates is not medically sensible.) What is the purpose of this request? Is someone telling you you need to provide estimates for instantaneous hazards? -- David Winsemius, MD West Hartford, CT
Gentlemen, I read R-news in batch mode so I'm often a day behind. Let me try to answer some of the questions. 1. X*beta != linear.predictor. I'm sorry if the documentation isn't all it could be. Between the book, tech report, and help I've written about 400 pages, but this particular topic isn't yet in it. The final snipe about being "opaque like SAS" was really unfair. The Cox model is a relative risk model, if lp is a linear predictor then so is lp +c for any constant; they are equally good and equally valid. The linear.predictor component in a coxph fit is (X-means) * beta. The computation exp(lp) occurs multiple times downstream and this keep the exp function from overflowing when there is something like a Date object as a predictor. Adding this constant changes not a single downstream calcuation. 2. Survfit is too slow. I'd like to hear more about this. My work mostly involves modest data sets so perhaps I haven't seen it. Accuracy and maintainability have been my first worries. 3. Baseline survival. Let xbase be a particular set of values for the x covariates (one for each). The survival curve for a given xbase is obtained from survfit fit <- coxph(.... sfit <- survfit(fit, newdata=xbase) chaz <- -log(sfit$surv) #cumulative hazard (The xbase vector will need to have variable names for the function to know which value goes to which of course). The cumulative hazard for any other subject will be newhaz <- chaz * exp(fit$coef%*% (x-xbase)) There is not a simple transformation of the standard error from one fit to another, however. You will need to call survfit with a data frame for newdata, which will return one curve per row with the proper values. In my view there is no such thing as "A" baseline survival curve. Any xbase you chose is a baseline. However, it is wise to choose something near the center of the data space in order to avoid numeric problems with the exp function above. I would never ever chose a vector of zeros, although some text books do -- it saves them about 8 characters of typing in the newhaz formula above. Terry Therneau