I am creating a model attempting to predict the probability someone will reoffend after being caught for a crime. There are seven total inputs and I planned on using a logistic regression. I started with a null deviance of 182.91 and ended up with a residual deviance of 83.40 after accounting for different interactions and such. However, I realized after that my code is different from that in my book. And I can't figure out what I need to put in it's place. Here's my code: library(foreign) library(car) foo = read.table("C:/Documents and Settings/Chris/Desktop/4330/criminals.dat", header=TRUE) reoff = foo[ ,1] race = foo[ ,2] age = foo[ ,3] gender = foo[ ,4] educ = foo[ ,5] subst = foo[ ,6] prior = foo[ ,7] violence = foo[ ,8] fit1h = glm(reoff ~ factor(subst) + factor(violence) + prior + factor(violence):factor(subst) + factor(violence):factor(educ) + factor(violence):factor(age) + factor(violence):factor(prior)) summary(fit1h) If you noticed, there's no part of my code that looks like: family=binomial(link="logit")) If I code like my book has done, it would look like: fit1i = glm(reoff ~ factor(subst) + factor(violence) + prior + factor(violence):factor(subst) + factor(violence):factor(educ) + factor(violence):factor(age) + factor(violence):factor(prior), family=binomial(link="logit")) summary(fit1i) However, when I do this, my null deviance is 1104 and my residual deviance is 23460. THIS IS A HUGE DIFFERENCE IN MODEL FIT! I'm not sure if I have to redo my model or if my book was simply doing the "family=binomial(link="logit")" for a specific problem/reason. So, to my question: Do I need to include "family=binomial(link="logit")" in my code? Do I need to include any type of family? Thanks for your help, -chris [[alternative HTML version deleted]]
On Sat, 2006-10-21 at 20:02 -0400, Chris Linton wrote:> I am creating a model attempting to predict the probability someone will > reoffend after being caught for a crime. There are seven total inputs and I > planned on using a logistic regression. I started with a null deviance of > 182.91 and ended up with a residual deviance of 83.40 after accounting for > different interactions and such. However, I realized after that my code is > different from that in my book. And I can't figure out what I need to put > in it's place. Here's my code: > > library(foreign) > > library(car) > > foo = read.table("C:/Documents and > Settings/Chris/Desktop/4330/criminals.dat", header=TRUE) > > > reoff = foo[ ,1] > > race = foo[ ,2] > > age = foo[ ,3] > > gender = foo[ ,4] > > educ = foo[ ,5] > > subst = foo[ ,6] > > prior = foo[ ,7] > > violence = foo[ ,8] > > fit1h = glm(reoff ~ factor(subst) + factor(violence) + prior + > factor(violence):factor(subst) + factor(violence):factor(educ) + > factor(violence):factor(age) + factor(violence):factor(prior)) > > summary(fit1h) > > > If you noticed, there's no part of my code that looks like: > > family=binomial(link="logit")) > > > If I code like my book has done, it would look like: > > fit1i = glm(reoff ~ factor(subst) + factor(violence) + prior + > factor(violence):factor(subst) + factor(violence):factor(educ) + > factor(violence):factor(age) + factor(violence):factor(prior), > family=binomial(link="logit")) > > summary(fit1i) > > > > > However, when I do this, my null deviance is 1104 and my residual deviance > is 23460. THIS IS A HUGE DIFFERENCE IN MODEL FIT! I'm not sure if I have > to redo my model or if my book was simply doing the > "family=binomial(link="logit")" for a specific problem/reason. > > So, to my question: > Do I need to include "family=binomial(link="logit")" in my code?Yes, though you could do with just 'family = binomial' since logit is the default link function.> Do I need > to include any type of family?If you don't want to use the default Gaussian family, then yes. Whatever book it is you are working from (which you fail to identify) ought to clearly explain the background on the use of the distribution families in GLM's. There is a reason the author has included these instructions and you need to pay attention to them. If you look carefully at the output of summary(fit1h), you will likely see: (Dispersion parameter for gaussian family taken to be ....) and you will also notice that the tests being applied (3rd and 4th columns in the coefficient summary table) are t tests and not z tests. These should be a big hint that you are not working with the proper family and are therefore not fitting a logistic regression model, which is presumably the intent of this section of the book. See ?glm and pay careful attention to the function defaults. HTH, Marc Schwartz
Chris Linton <connect.chris <at> gmail.com> writes:> > I am creating a model attempting to predict the probability someone will > reoffend after being caught for a crime. There are seven total inputs and I > planned on using a logistic regression. I started with a null deviance of > 182.91 and ended up with a residual deviance of 83.40 after accounting for > different interactions and such. However, I realized after that my code is > different from that in my book. And I can't figure out what I need to put > in it's place. Here's my code: >...> fit1h = glm(reoff ~ factor(subst) + factor(violence) + prior + > factor(violence):factor(subst) + factor(violence):factor(educ) + > factor(violence):factor(age) + factor(violence):factor(prior)) > > summary(fit1h) > > If you noticed, there's no part of my code that looks like: > > family=binomial(link="logit")) >...> > However, when I do this, my null deviance is 1104 and my residual deviance > is 23460. THIS IS A HUGE DIFFERENCE IN MODEL FIT! I'm not sure if I have > to redo my model or if my book was simply doing the > "family=binomial(link="logit")" for a specific problem/reason.You state that you model the *probability* that ... Then family=gaussian, which is the default data generation model in glm is not appropriate. Yes, you need to use family=binomial(link="logit") or family=binomial(link="probit"), but you also need to take care in proper specification of your y in the glm call. Gregor