On Sat, 28 Mar 2009, Bob Green wrote:
>
>> Hello,
>
> I am hoping for assistance in regards to examining the contribution of
> stratified variables in a cox regression. A previous post by Terry Therneau
> noted that "That is the point of a strata; you are declaring a
variable toNOT
> be proportional hazards, and thus there is no single "hazard
ratio" that
> describes it". Given this purpose of stratification, in the process of
building
> and testing a model, is there a way to test if the stratified variables do
add
> anything to a model?
I'm not aware of any formal test for whether stratification helps. It's
difficult because you are adding an infinite-dimensional parameter to the model,
and this parameter doesn't even appear in the partial likelihood. Nothing
simple is going to work.
In principle one could compare the two stratum baseline cumulative hazards to
see if they were proportional to each other, eg, see if the difference in
log-cumulative baseline hazard was constant over time. The bootstrap is valid
for the baseline cumulative hazards, so one could get confidence intervals on a
suitable summary statistic that way.
> Two variables were stratified because it was considered that the
proportional
> hazards assumption was not met (via inspection of log-log plots where the
> curves crossed. I have examined. There were no cox.zph values that were
> statistically significant. I did produce plots but found these difficult to
> interpret).
There isn't much information loss in stratifying, as long as it's not
overdone, which is probably why there hasn't been much work on tests. The
main loss is that the model becomes more complicated and harder to summarize.
> The statistician I have been consulting said that in SPSS when
> variables are stratified a model is produced for each different strata (e.g
a
> separate analysis for male and female if a gender variable were
stratified).
> I have not seen this approach used in R examples I have seen.
Fitting a completely separate model for each stratum is equivalent to
stratifying *and* adding a interaction with stratum to each predictor variable.
This does result in a loss of information, and is usually overkill. You can add
stratum interactions just to the variables where they are needed.
This may be related to the collision in terminology where epidemiologists say
'stratify' to mean 'do a completely separate analysis' and
statisticians say 'stratify' to mean 'pool the stratum-specific
analyses to get an overall estimate'.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle