I have a very general question about what the centering option in basehaz does to factors. (basehaz computes the baseline cumulative hazard for a coxph object using the Breslow estimator). Lets say I'm interested in a survival model with two (dichotomous) factors and a continuous covariate. Variable Possible Values Factor1 0 or 1 Factor2 0 or 1 Covariate 0 to 100 I fit my model: modelname <- coxph(Surv ~ Factor1 + Factor2 + Covariate, data = data) If I then ask for: baselineA <- basehaz(modelname, centered=FALSE) I am fairly certain that baselineA will provide me with the cumulative hazard evaluated at Factor1 = 0, Factor2 = 0, Covariate = 0. Yet, if I ask for: baselineB <- basehaz(modelname, centered=TRUE) I know that baselineB will evaluate the cumulative hazard at Covariate = 50, but am uncertain as to what it does with the factors. I would not think that the function would attempt to average a "factor"; however, I cannot find any documentation to support my assumption. To make sure, does anyone know how basehaz (centered = TRUE/FALSE) handles models that include both categorical factors and continuous covariates? Thanks in advance, Dave Koons [[alternative HTML version deleted]]
On Thu, 27 Sep 2007, David Koons wrote:> I have a very general question about what the centering option in basehaz does to factors. (basehaz computes the baseline cumulative hazard for a coxph object using the Breslow estimator). > > Lets say I'm interested in a survival model with two (dichotomous) factors and a continuous covariate. > Variable Possible Values > Factor1 0 or 1 > Factor2 0 or 1 > Covariate 0 to 100 > > I fit my model: > modelname <- coxph(Surv ~ Factor1 + Factor2 + Covariate, data = data) > > If I then ask for: > baselineA <- basehaz(modelname, centered=FALSE) > I am fairly certain that baselineA will provide me with the cumulative hazard evaluated at Factor1 = 0, Factor2 = 0, Covariate = 0. > > Yet, if I ask for: baselineB <- basehaz(modelname, centered=TRUE) I know > that baselineB will evaluate the cumulative hazard at Covariate = 50, > but am uncertain as to what it does with the factors. I would not think > that the function would attempt to average a "factor"; however, I cannot > find any documentation to support my assumption. To make sure, doesIt does average factors. From ?coxph.object means: vector of column means of the X matrix. Subsequent survival curves are adjusted to this value. and so the centring is about the means of the design matrix (X) after expansion of categorical variables. The code for basehaz is very simple: just list it to see that all it does is to remove the centring of the columns of the design matrix.> anyone know how basehaz (centered = TRUE/FALSE) handles models that > include both categorical factors and continuous covariates? > > Thanks in advance, > Dave Koons-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Thu, 27 Sep 2007, David Koons wrote:> I have a very general question about what the centering option in > basehaz does to factors. (basehaz computes the baseline cumulative > hazard for a coxph object using the Breslow estimator). > > Lets say I'm interested in a survival model with two (dichotomous) factors and a continuous covariate. > Variable Possible Values > Factor1 0 or 1 > Factor2 0 or 1 > Covariate 0 to 100 > > I fit my model: > modelname <- coxph(Surv ~ Factor1 + Factor2 + Covariate, data = data) > > If I then ask for: > baselineA <- basehaz(modelname, centered=FALSE) > I am fairly certain that baselineA will provide me with the cumulative > hazard evaluated at Factor1 = 0, Factor2 = 0, Covariate = 0.Indeed> Yet, if I ask for: > baselineB <- basehaz(modelname, centered=TRUE) > I know that baselineB will evaluate the cumulative hazard at Covariate = > 50Only if 50 is the mean.>, but am uncertain as to what it does with the factors. I would not > think that the function would attempt to average a "factor"; however, I > cannot find any documentation to support my assumption. To make sure, > does anyone know how basehaz (centered = TRUE/FALSE) handles models that > include both categorical factors and continuous covariates?Yes, someone does (perhaps many people). It averages the columns of the design matrix. This is not quite 'averaging factors' as the result depends on the choice of contrasts as well as on the coding of the factor. In your example it isn't clear whether 'Factor1' and 'Factor2' are defined as factors, but with the default coding and default contrasts it doesn't affect the answer. The mean is the default centering for survfit.coxph(), partly because it is the centering used internally to fit the Cox model. The main point of basehaz() is to provide centered=FALSE for people who want it (or think they do). You can get survival curves for any covariate values you like from survfit.coxph(). -thomas