David Winsemius
2008-Nov-03 15:38 UTC
[R] NaN causes "error in fitter" with cph.calibrate from pkg Design
I have been attempting to use cph models to get better calibration of my models for which I had originally used logistic regression. I tried running with 40 repetitions and got an error. I then tried 500 repetitions (thinking that the NaNs in the output below might be caused by that choice) and then let my computer crunch for several hours and got only the same error message and warnings for my output. Since the dataset is large and has caused memory overflow in the past (despite having 4 GB in a UNIX OS), I had previously saved it and loaded it back into a clean run of R. The memory overflow did not occur with this approach, but it appears that the NaNs at the "high" end of the survival probability scale are causing fatal errors with the fitter. It appears that for half of the samples, an n of 40 is being used while for the other other half an adaptive sampling size is being used. I hope I have included enough detail below. Any thoughts on a way out of this? Would the full screen output be useful in construction a calibration curve? (I have copied it to a file.) Would there be a way of adaptively increasing the number of subjects in the sampling at the higher values of x (which I take to be a survival fraction at the interval being examined)? Or perhaps there would be a way of specifying a larger sampling fraction for all runs? I have gone through the documentation and have not identified a parameter that would do so. I tried looking at the code of cph.calibrate, but could not figure out how the sample size of 40 was established. Sincerely; David Winsemius ---the call------and trimmed output --- and after that, system info and str(<model>) cal <- calibrate( cph.A.S.H.T.G.S.D, m=40, method="boot", B=50, u=4) Using Cox survival estimates at 1 Days Averaging 5 repetitions of B= 10 x n events KM std.err [1,] 0.8766 41 18 0.5261 0.26987 [2,] 0.9359 39 10 0.7453 0.34283 [3,] 0.9416 40 7 0.8247 0.42187 [4,] 0.9466 40 6 0.8610 0.45340 [5,] 0.9499 40 7 0.9041 0.59887 [6,] 0.9531 40 7 0.8337 0.50418 [7,] 0.9552 40 7 0.8536 0.46862 trimmed output in middle [54,] 0.9850 1000 84 0.9227 0.12758 [55,] 0.9855 40 1 0.9688 1.00004 [56,] 0.9860 1240 91 0.9349 0.12622 [57,] 0.9865 40 2 0.9500 0.70718 [58,] 0.9870 1520 103 0.9382 0.12124 [59,] 0.9875 40 3 0.9677 1.00004 [60,] 0.9880 1960 129 0.9402 0.10282 [61,] 0.9885 40 1 0.9565 1.00008 [62,] 0.9890 2520 143 0.9530 0.10141 [63,] 0.9895 40 3 0.9750 1.00003 [64,] 0.9900 3320 179 0.9544 0.09349 [65,] 0.9905 40 1 1.0000 NaN [66,] 0.9910 4200 212 0.9571 0.08377 [67,] 0.9915 40 1 1.0000 NaN [68,] 0.9920 5921 264 0.9617 0.07440 [69,] 0.9925 40 0 1.0000 NaN [70,] 0.9930 7960 330 0.9650 0.06637 [71,] 0.9935 40 1 1.0000 NaN [72,] 0.9940 12160 436 0.9695 0.05722 [73,] 0.9945 40 0 1.0000 NaN [74,] 0.9950 18281 539 0.9777 0.05360 [75,] 0.9955 40 1 0.9750 1.00003 [76,] 0.9960 30841 666 0.9831 0.04845 [77,] 0.9965 40 1 1.0000 NaN [78,] 0.9971 55002 951 0.9863 0.03981 [79,] 0.9975 40 0 1.0000 NaN [80,] 0.9981 114005 1317 0.9909 0.03398 [81,] 0.9985 40 0 1.0000 NaN [82,] 0.9991 366094 1790 0.9961 0.02874 [83,] 0.9995 40 1 0.9750 1.00003 [84,] 0.9997 253649 518 0.9984 0.05316 [85,] 0.9999 40 0 1.0000 NaN Error in fitter(X, Y, strata = Strata, offset = offset, weights = weights, : NA/NaN/Inf in foreign function call (arg 6) In addition: There were 50 or more warnings (use warnings() to see the first 50) er(X, Y, strata = Strata, offset = offset, weights = weights, : The warnings are of two types. Here are the first ten: > warnings() Warning messages: 1: In groupkm(cox, Surv(y[, 1], y[, 2]), u = u, cuts = orig.cuts) ... : one interval had < 2 observations 2: In groupkm(cox, Surv(y[, 1], y[, 2]), u = u, cuts = orig.cuts) ... : one interval had < 2 observations 3: In groupkm(cox, Surv(y[, 1], y[, 2]), u = u, cuts = orig.cuts) ... : one interval had < 2 observations 4: In train.stat + train.statj ... : longer object length is not a multiple of shorter object length 5: In test.stat + test.statj * wt ... : longer object length is not a multiple of shorter object length 6: In groupkm(cox, Surv(y[, 1], y[, 2]), u = u, cuts = orig.cuts) ... : one interval had < 2 observations 7: In groupkm(cox, Surv(y[, 1], y[, 2]), u = u, cuts = orig.cuts) ... : one interval had < 2 observations 8: In groupkm(cox, Surv(y[, 1], y[, 2]), u = u, cuts = orig.cuts) ... : one interval had < 2 observations 9: In groupkm(cox, Surv(y[, 1], y[, 2]), u = u, cuts = orig.cuts) ... : one interval had < 2 observations 10: In num + (!na) ... : longer object length is not a multiple of shorter object length > .Platform $OS.type [1] "unix" $file.sep [1] "/" $dynlib.ext [1] ".so" $GUI [1] "AQUA" $endian [1] "little" $pkgType [1] "mac.binary" $path.sep [1] ":" $r_arch [1] "i386" > str(cph.A.S.H.T.G.S.D) List of 30 $ coefficients : Named num [1:16] 0.0588 0.0296 0.1848 -0.0436 0.0838 ... ..- attr(*, "names")= chr [1:16] "Age" "Age'" "Sex=Male" "BL_HDL.A" ... $ var : num [1:16, 1:16] 6.21e-06 -5.83e-06 -1.43e-06 1.51e-06 -1.03e-06 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:16] "Age" "Age'" "Sex=Male" "BL_HDL.A" ... .. ..$ : chr [1:16] "Age" "Age'" "Sex=Male" "BL_HDL.A" ... $ loglik : num [1:2] -110975 -105680 $ score : num 17894 $ iter : int 9 $ linear.predictors: Named num [1:886393] -0.333 -0.710 1.028 -1.087 1.127 ... ..- attr(*, "names")= chr [1:886393] "1" "2" "3" "4" ... $ residuals : Named num [1:886393] -0.01252 -0.00859 -0.04884 -0.00589 -0.05392 ... ..- attr(*, "names")= chr [1:886393] "1" "2" "3" "4" ... $ means : num [1:16] 43.472 4.162 0.563 52.815 8.503 ... $ terms :Classes 'terms', 'formula' length 3 srv900 ~ rcs(Age, c(35, 50, 65)) + Sex + rcs(BL_HDL.A, 3) * rcs(BL_CHOLEST.A, 3) + rcs(BL_GGT.A, 3) + BL_TRIGLYC.A + rcs(BL_GGT.A, 3) + ... .. ..- attr(*, "variables")= language list(srv900, rcs(Age, c(35, 50, 65)), Sex, rcs(BL_HDL.A, 3), rcs(BL_CHOLEST.A, 3), rcs(BL_GGT.A, 3), BL_TRIGLYC.A, BP.B, ... .. ..- attr(*, "factors")= int [1:9, 1:9] 0 1 0 0 0 0 0 0 0 0 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:9] "srv900" "rcs(Age, c(35, 50, 65))" "Sex" "rcs(BL_HDL.A, 3)" ... .. .. .. ..$ : chr [1:9] "rcs(Age, c(35, 50, 65))" "Sex" "rcs(BL_HDL.A, 3)" "rcs(BL_CHOLEST.A, 3)" ... .. ..- attr(*, "term.labels")= chr [1:9] "rcs(Age, c(35, 50, 65))" "Sex" "rcs(BL_HDL.A, 3)" "rcs(BL_CHOLEST.A, 3)" ... .. ..- attr(*, "specials")=Dotted pair list of 2 .. .. ..$ strat : NULL .. .. ..$ cluster: NULL .. ..- attr(*, "order")= int [1:9] 1 1 1 1 1 1 1 1 2 .. ..- attr(*, "intercept")= int 1 .. ..- attr(*, "response")= int 1 .. ..- attr(*, ".Environment")=<R_GlobalEnv> $ n : 'table' int [, 1:2] 877967 8426 ..- attr(*, "dimnames")=List of 1 .. ..$ Status: chr [1:2] "No Event" "Event" $ call : language cph(formula = srv900 ~ rcs(Age, c(35, 50, 65)) + Sex + rcs(BL_HDL.A, 3) * rcs(BL_CHOLEST.A, 3) + rcs(BL_GGT.A, 3) + BL_TRIGLYC.A + ... $ Design :List of 11 ..$ name : chr [1:9] "Age" "Sex" "BL_HDL.A" "BL_CHOLEST.A" ... ..$ label : chr [1:9] "Age" "Sex" "BL_HDL.A" "BL_CHOLEST.A" ... ..$ units : Named chr [1:8] "" "" "" "" ... .. ..- attr(*, "names")= chr [1:8] "Age" "Sex" "BL_HDL.A" "BL_CHOLEST.A" ... ..$ colnames : chr [1:16] "Age" "Age'" "Sex=Male" "BL_HDL.A" ... ..$ assume : chr [1:9] "rcspline" "category" "rcspline" "rcspline" ... ..$ assume.code : int [1:9] 4 5 4 4 4 1 1 1 9 ..$ parms :List of 6 .. ..$ Age : num [1:3] 35 50 65 .. ..$ Sex : chr [1:2] "Female" "Male" .. ..$ BL_HDL.A : num [1:3] 35 51 73 .. ..$ BL_CHOLEST.A : num [1:3] 154 199 253 .. ..$ BL_GGT.A : num [1:3] 11 22 55 .. ..$ BL_HDL.A * BL_CHOLEST.A: num [1:3, 1:5] 3 4 0 0 0 0 1 0 0 0 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : NULL .. .. .. ..$ : chr [1:5] "parms" "" "" "" ... ..$ limits : list() ..$ values : list() ..$ nonlinear :List of 9 .. ..$ Age : logi [1:2] FALSE TRUE .. ..$ Sex : logi FALSE .. ..$ BL_HDL.A : logi [1:2] FALSE TRUE .. ..$ BL_CHOLEST.A : logi [1:2] FALSE TRUE .. ..$ BL_GGT.A : logi [1:2] FALSE TRUE .. ..$ BL_TRIGLYC.A : logi FALSE .. ..$ BP.B : logi FALSE .. ..$ BP.A : logi FALSE .. ..$ BL_HDL.A * BL_CHOLEST.A: logi [1:4] FALSE TRUE TRUE TRUE ..$ interactions: num [1:3, 1] 3 4 0 $ assign :List of 9 ..$ rcs(Age, c(35, 50, 65)) : int [1:2] 1 2 ..$ Sex : int 3 ..$ rcs(BL_HDL.A, 3) : int [1:2] 4 5 ..$ rcs(BL_CHOLEST.A, 3) : int [1:2] 6 7 ..$ rcs(BL_GGT.A, 3) : int [1:2] 8 9 ..$ BL_TRIGLYC.A : int 10 ..$ BP.B : int 11 ..$ BP.A : int 12 ..$ rcs(BL_HDL.A, 3):rcs(BL_CHOLEST.A, 3): int [1:4] 13 14 15 16 $ na.action :List of 3 ..$ nmiss : Named int [1:9] 780 9389 3891 2405 2404 3055 2677 10623 10328 .. ..- attr(*, "names")= chr [1:9] "srv900" "Age" "Sex" "BL_HDL.A" ... ..$ omit : Named int [1:24266] 20 29 53 75 82 84 119 160 201 230 ... .. ..- attr(*, "names")= chr [1:24266] "20" "29" "53" "75" ... ..$ na.detail.response: NULL ..- attr(*, "class")= chr "delete" $ fail : logi FALSE $ non.slopes : num 0 $ stats : Named num [1:8] 886393 8426 10590 16 0 ... ..- attr(*, "names")= chr [1:8] "Obs" "Events" "Model L.R." "d.f." ... $ method : chr "efron" $ maxtime : num 9.95 $ time.inc : num 1 $ units : chr "Day" $ fitFunction : chr [1:2] "cph" "coxph" $ center : num 0.82 $ scale.pred : chr [1:2] "log Relative Hazard" "Hazard Ratio" $ x : num [1:886393, 1:16] 35.1 27.0 56.4 22.5 56.0 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:886393] "1" "2" "3" "4" ... .. ..$ : chr [1:16] "Age" "Age'" "Sex=Male" "BL_HDL.A" ... $ y : Surv [1:886393, 1:2] 9.94661+ 9.90007+ 9.84805+ 9.85900+ 9.86448+ 9.85900+ 9.88364+ 9.85079+ 9.79329+ 9.82615+ ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:886393] "1" "2" "3" "4" ... .. ..$ : chr [1:2] "time" "status" ..- attr(*, "type")= chr "right" ..- attr(*, "units")= chr "Day" ..- attr(*, "time.label")= chr "surv.yr" ..- attr(*, "event.label")= chr "cens" $ time : num [1:2599] 0.00000 0.00000 0.00274 0.00548 0.00821 ... $ surv : atomic [1:2599] 1 1 1 1 1 ... ..- attr(*, "type")= chr "efron" $ std.err : num [1:2599] NA 0.0483 0.0482 0.0482 0.0479 ... $ surv.summary : num [1:10, 1, 1:3] 1.000 0.999 0.998 0.997 0.996 ... ..- attr(*, "dimnames")=List of 3 .. ..$ : chr [1:10] "0" "1" "2" "3" ... .. ..$ : chr "Stratum 1" .. ..$ : chr [1:3] "Survival" "n.risk" "std.err" - attr(*, "class")= chr [1:3] "cph" "Design" "coxph"