Larry,
You found a data set that kills coxph. I'll have to think about what
to do since on the one hand it's your own fault for trying to fit a very
bad model, and on the other I'd like the routine to give a nice error
message before it dies.
In the data set you sent me the predictor variable is very skewed:
> quantile(anomaly1$CREAT, c(0, .5, .9, .999, 1))
0% 50% 90% 99.9% 100%
0.5000 1.4000 2.2000 4.6027 7.3000
There are 1000 observations, the largest is 7.3 and the second largest
is 4.6. The subject with a 7.3 was a failure. When you fit a 4 degree
of freedom polynomial to this there is enough "space" between the two
largest obvservations for the fit to turn nearly straight up and give a
probability of failure of approx 1.0 to this point. This is the actual,
true maximum likelihood solution, so the routine is doing the right
thing to try and attain it. At iteration 1 the linear predictor from
the Cox model has the following distribution.
>quantile(lp, c(0, .5, .9, .999, 1))
0% 50% 90% 99.9% 100%
-1.65321482 -1.19164206 -0.08964025 31.05389866 900.67627778
Unfortunately exp(900) = Inf, and that pretty much does in my C routine
right there; within another iteration there are NA values littered about
and the return value for the coefficients is all NA.
If instead you fit a smoothing spline to the data
coxph(Surv(TTFAIL, FFAIL) ~ pspline(CREAT6MO, df=4), data=anomaly1)
the fitted response is essentially linear from 0 to 5 before turning
upward, but it never gets outrageous due to the shrinkage towards
linearity.
Catching this type of issue with an extremely skewed variable
(creatinine^4) would be very hard to do reliably.
The primary message is that polynomials and biology are a bad mix.
Terry Therneau