d_chall
2011-May-16 15:37 UTC
[R] Logistic regression model returns lower than expected logit
Hi all, I'm using a logistic regression model (created with 'glm') with 3 variables to separate true positives from errors in a data set. All in all it seems to perform quite well, but for some reason the logit values seem to be much lower that they should be. What I mean is that in order to get ~90% sensitivity and ~90% precision I have to set my logit cutoff at around -1 or 0. From my (very limited) understanding a logit cutoff of 0 should give you around 50% precision (half your final data set it TP, half is FP). I get this effect when I run the model on the same data it was trained on. My only idea for a cause of this so far is that my training data set had roughly 10x as many true-negative data points as true-positive data points, but evening them out didn't seem to fix the problem much. Here is my model summary with output from R's glm ====================================Deviance Residuals: Min 1Q Median 3Q Max -4.48817 -0.17130 -0.10221 -0.05374 3.36833 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.85666 0.33868 -2.529 0.011425 * var1 1.08770 0.15364 7.080 1.45e-12 *** var2 0.67537 0.08003 8.439 < 2e-16 *** var3 -1.25332 0.33595 -3.731 0.000191 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1230.63 on 2034 degrees of freedom Residual deviance: 341.81 on 2031 degrees of freedom ==================================== thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Logistic-regression-model-returns-lower-than-expected-logit-tp3526542p3526542.html Sent from the R help mailing list archive at Nabble.com.