I was unsure of what suitable goodness-of-fit tests existed in R for logistic regression. After searching the R-help archive I found that using the Design models and resid, could be used to calculate this as follows: d <- datadist(mydataframe) options(datadist = 'd') fit <- lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T) resid(fit, 'gof'). I set up a script to first use glm to create models use stepAIC to determine the optimal model. I used this instead of fastbw because I found the AIC values to be completely different and the final models didn't always match. Then my script takes the reduced model formula and recreates it using lrm as above. Now the problem is that for some models I run into an error to which I can find no reference whatsoever on the mailing list or on the web. It is as follows: test.lrm <- lrm(cclo ~ elev + aspect + cti_var + planar + feat_div + loamy + sands + sandy + wet + slr_mean, data=datamatrix, x = T, y = T) singular information matrix in lrm.fit (rank= 10 ). Offending variable(s): slr_mean Error in j:(j + params[i] - 1) : NA/NaN argument Now if I add the singularity criterion and make the value smaller than the default of 1E-7 to 1E-9 or 1E-12 which is the default in calibrate, it works. Why is that? Not being a statistician but a biogeographer using regression as a tool, I don't really understand what is happening here. Does changing the tol variable, change how I should interpret goodness-of-fit results or other evaluations of the models created? I've included a summary of the data below (in case it might be helpful) with all variables in the data frame as it was easier than selecting out the ones used in the model. Thanks in advance. T -- Trevor Wiens twiens at interbaun.com The significant problems that we face cannot be solved at the same level of thinking we were at when we created them. (Albert Einstein) ---------------------------- summary(datamatrix) siteid block recordyear cclo 564-125: 5 Min. :1.000 Min. :2000 Min. :0.0000 564-130: 5 1st Qu.:2.000 1st Qu.:2001 1st Qu.:1.0000 564-135: 5 Median :3.000 Median :2002 Median :1.0000 564-140: 5 Mean :3.042 Mean :2002 Mean :0.7509 564-145: 5 3rd Qu.:4.000 3rd Qu.:2003 3rd Qu.:1.0000 564-150: 5 Max. :5.000 Max. :2004 Max. :1.0000 (Other):1098 elev slope aspect slr_mean Min. :0.0000 Min. :0.1499 Min. :0.0000 Min. :7681 1st Qu.:0.0000 1st Qu.:0.5876 1st Qu.:0.0000 1st Qu.:7852 Median :1.0000 Median :0.9195 Median :0.0000 Median :7877 Mean :0.6259 Mean :1.2523 Mean :0.2482 Mean :7871 3rd Qu.:1.0000 3rd Qu.:1.6694 3rd Qu.:0.0000 3rd Qu.:7892 Max. :1.0000 Max. :5.3366 Max. :1.0000 Max. :7981 cti cti_var planar feat_div Min. :7.157 Min. :0.4497 Min. :0.0000 Min. :1.000 1st Qu.:7.651 1st Qu.:0.6187 1st Qu.:1.0000 1st Qu.:2.000 Median :7.720 Median :0.8495 Median :1.0000 Median :3.000 Mean :7.763 Mean :0.9542 Mean :0.8254 Mean :3.379 3rd Qu.:7.822 3rd Qu.:1.1918 3rd Qu.:1.0000 3rd Qu.:4.000 Max. :8.769 Max. :2.5615 Max. :1.0000 Max. :6.000 chop_san loamy sands sandy Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000 1st Qu.:0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 Median :0.00000 Median :0.0000 Median :0.0000 Median :0.0000 Mean :0.05762 Mean :0.3094 Mean :0.3236 Mean :0.1099 3rd Qu.:0.00000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000 wet timesinceburn ndvi evi Min. :0.00000 Min. : 1.00 Min. :0.1140 Min. :0.1041 1st Qu.:0.00000 1st Qu.:100.00 1st Qu.:0.2973 1st Qu.:0.1667 Median :0.00000 Median :100.00 Median :0.3342 Median :0.2027 Mean :0.01950 Mean : 87.84 Mean :0.3629 Mean :0.2184 3rd Qu.:0.00000 3rd Qu.:100.00 3rd Qu.:0.4463 3rd Qu.:0.2711 Max. :1.00000 Max. :100.00 Max. :0.5932 Max. :0.4788 msavi2 fc gdd precip Min. :0.09156 Min. :0.1552 Min. :380.6 Min. : 50.04 1st Qu.:0.14936 1st Qu.:0.3246 1st Qu.:492.8 1st Qu.: 76.17 Median :0.18257 Median :0.4082 Median :500.8 Median : 85.50 Mean :0.19653 Mean :0.4398 Mean :476.4 Mean : 94.35 3rd Qu.:0.24626 3rd Qu.:0.5630 3rd Qu.:501.6 3rd Qu.: 95.16 Max. :0.33258 Max. :0.6996 Max. :519.7 Max. :163.86 precip_1 precip_2 slr_yr Min. :164.2 Min. :164.2 Min. :7417 1st Qu.:254.2 1st Qu.:254.2 1st Qu.:7704 Median :338.0 Median :357.1 Median :7775 Mean :298.1 Mean :301.5 Mean :7828 3rd Qu.:357.1 3rd Qu.:360.5 3rd Qu.:8014 Max. :414.2 Max. :414.2 Max. :8151
On Thu, 10 Mar 2005, Trevor Wiens wrote:> I was unsure of what suitable goodness-of-fit tests existed in R for > logistic regression. After searching the R-help archive I found that > using the Design models and resid, could be used to calculate this as > follows: > > d <- datadist(mydataframe) > options(datadist = 'd') > fit <- lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T) > resid(fit, 'gof'). > > I set up a script to first use glm to create models use stepAIC to > determine the optimal model. I used this instead of fastbw because I > found the AIC values to be completely different and the final models > didn't always match. Then my script takes the reduced model formula and > recreates it using lrm as above. Now the problem is that for some models > I run into an error to which I can find no reference whatsoever on the > mailing list or on the web. It is as follows: > > test.lrm <- lrm(cclo ~ elev + aspect + cti_var + planar + feat_div + loamy + sands + sandy + wet + slr_mean, data=datamatrix, x = T, y = T) > singular information matrix in lrm.fit (rank= 10 ). Offending variable(s): > slr_mean > Error in j:(j + params[i] - 1) : NA/NaN argument > > > Now if I add the singularity criterion and make the value smaller than > the default of 1E-7 to 1E-9 or 1E-12 which is the default in calibrate, > it works. Why is that? > > Not being a statistician but a biogeographer using regression as a tool, > I don't really understand what is happening here.>From one geographer to another, and being prepared to bow tobetter-founded explanations, you seem to have included a variable - the offending variable slr_mean - that is very highly correlated with another. Making the tolerance tighter says that you are prepared to take the risk of confounding your results. You've already "been fishing" for right hand side variables anyway, so your results are somewhat prejudiced, aren't they? I think you may also like to review which of the right hand side variables should be treated as factors rather than numeric (looking at the summary suggests that many are factors), and perhaps the dependent variable too, although lrm() seems to take care of this if you haven't.> > Does changing the tol variable, change how I should interpret > goodness-of-fit results or other evaluations of the models created? > > I've included a summary of the data below (in case it might be helpful) > with all variables in the data frame as it was easier than selecting out > the ones used in the model. > > Thanks in advance. > > T >-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: Roger.Bivand at nhh.no
Trevor Wiens wrote:> I was unsure of what suitable goodness-of-fit tests existed in R for logistic regression. After searching the R-help archive I found that using the Design models and resid, could be used to calculate this as follows: > > d <- datadist(mydataframe) > options(datadist = 'd') > fit <- lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T) > resid(fit, 'gof'). > > I set up a script to first use glm to create models use stepAIC to determine the optimal model. I used this instead of fastbw because I found the AIC values to be completely different and the final models didn't always match. Then my script takes the reduced model formula and recreates it using lrm as above. Now the problem is that for some models I run into an error to which I can find no reference whatsoever on the mailing list or on the web. It is as follows: > > test.lrm <- lrm(cclo ~ elev + aspect + cti_var + planar + feat_div + loamy + sands + sandy + wet + slr_mean, data=datamatrix, x = T, y = T) > singular information matrix in lrm.fit (rank= 10 ). Offending variable(s): > slr_mean > Error in j:(j + params[i] - 1) : NA/NaN argument > > > Now if I add the singularity criterion and make the value smaller than the default of 1E-7 to 1E-9 or 1E-12 which is the default in calibrate, it works. Why is that? > > Not being a statistician but a biogeographer using regression as a tool, I don't really understand what is happening here. > > Does changing the tol variable, change how I should interpret goodness-of-fit results or other evaluations of the models created? > > I've included a summary of the data below (in case it might be helpful) with all variables in the data frame as it was easier than selecting out the ones used in the model. > > Thanks in advance. > > TThe goodness of fit test only works on prespecified models. It is not valid when stepwise variable selection is used (unless perhaps you use alpha=0.5). -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University