Matthias Gondan wrote:> Dear R users,
>
> I noticed a problem in the anova command when applied on
> a single coxph object if there are missing observations in
> the data:
>
> This example code was run on R-2.6.1:
>
> > library(survival)
> > data(colon)
> > colondeath = colon[colon$etype==2, ]
> > m = coxph(Surv(time, status) ~ rx + sex + age + perfor,
data=colondeath)
> > m
> Call:
> coxph(formula = Surv(time, status) ~ rx + sex + age + perfor,
> data = colondeath)
>
> coef exp(coef) se(coef) z p
> rxLev -0.028895 0.972 0.11037 -0.262 0.7900
> rxLev+5FU -0.374286 0.688 0.11885 -3.149 0.0016
> sex -0.000754 0.999 0.09431 -0.008 0.9900
> age 0.002442 1.002 0.00405 0.603 0.5500
> perfor 0.155695 1.168 0.26286 0.592 0.5500
>
> Likelihood ratio test=12.8 on 5 df, p=0.0251 n= 929
>
> > anova(m, test='Chisq')
> Analysis of Deviance Table
> Cox model: response is Surv(time, status)
> Terms added sequentially (first to last)
>
> Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL 929 5860.4
> rx 2 12.1 927 5848.2 2.302e-03
> sex 1 2.054e-05 926 5848.2 1.0
> age 1 0.3 925 5847.9 0.6
> perfor 1 0.3 924 5847.6 0.6
>
> Now I include nodes which has some missing data:
>
> > m = coxph(Surv(time, status) ~ rx + sex + age + perfor + nodes,
> data=colondeath)
> > m
> Call:
> coxph(formula = Surv(time, status) ~ rx + sex + age + perfor +
> nodes, data = colondeath)
>
> coef exp(coef) se(coef) z p
> rxLev -0.08245 0.921 0.11168 -0.738 0.46000
> rxLev+5FU -0.40310 0.668 0.12054 -3.344 0.00083
> sex -0.02854 0.972 0.09573 -0.298 0.77000
> age 0.00547 1.005 0.00405 1.350 0.18000
> perfor 0.19040 1.210 0.26335 0.723 0.47000
> nodes 0.09296 1.097 0.00889 10.460 0.00000
>
> Likelihood ratio test=88.3 on 6 df, p=1.11e-16 n=911 (18 observations
> deleted due to missingness)
>
> > anova(m, test='Chisq')
> Analysis of Deviance Table
> Cox model: response is Surv(time, status)
> Terms added sequentially (first to last)
>
> Df Deviance Resid. Df Resid. Dev P(>|Chi|)
> NULL 911 5700.6
> rx 2 0.0 909 5848.2 1.0
> sex 1 2.054e-05 908 5848.2 1.0
> age 1 0.3 907 5847.9 0.6
> perfor 1 0.3 906 5847.6 0.6
> nodes 1 235.3 905 5612.3 4.253e-53
>
> The strange thing is that rx is not significant anymore.
>
> In the documentation for anova.coxph, there is a warning that
>
>
>> The comparison between two or more models by |anova| or will only be
>> valid if they are fitted to the same dataset. This may be a problem if
>> there are missing values.
>>
>>
> However, I inserted a single object to be analyzed sequentially. Is
> this a bug in R, or is it covered by the warning?
>
Notice that you also lose the 18 observations in the comparison of .~rx
with the empty model.
This is standard, losing observations on the way through an anova table
leads to madness.
What happens if you do something like
coxph(Surv(time, status) ~ rx,
data=colondeath, subset=complete.cases(nodes))
or the corresponding survdiff() call?
--
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907