Dear All, I am learning the ropes about logistic regression in R. I found some interesting examples http://bit.ly/Vq4GgX http://bit.ly/W9fUTg http://bit.ly/UfK73e but I am a bit lost. I have several questions. 1) For instance, what is the difference between glm.out = glm(response ~ poverty + gender, family=binomial(logit), data=mydata) and glm.out = glm(response ~ poverty * gender, family=binomial(logit), data=mydata) ? Which begs the question when I should use the "*" or "+" sign when doing a logistic regression on several explanatory variables. I think that in the former case I am allowing for an interaction between poverty and gender, but I would like to be sure about it. 2) Consider the following snippet glm.out = glm(response ~ poverty + gender, family=binomial(logit), data=mydata) where "response" is a dichotomous variable, poverty assumes only two values (Above poverty line and Below poverty line) and gender assumes only the Male or Female values. The command above leads to the following output ####################################### print(summary(glm.out)) Call: glm(formula = response ~ poverty + gender, family = binomial(logit), data = mydata) Deviance Residuals: Min 1Q Median 3Q Max -2.2094 0.4269 0.4269 0.8033 1.1911 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.9656 0.1477 6.538 6.25e-11 *** povertyBelow poverty line -0.9978 0.3246 -3.074 0.00211 ** genderFEMALE 1.3840 0.2549 5.429 5.68e-08 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 494.81 on 499 degrees of freedom Residual deviance: 457.13 on 497 degrees of freedom AIC: 463.13 Number of Fisher Scoring iterations: 4 ############################################## To calculate then odds ratios, I should do the following exp(coef(glm.out)) (Intercept) povertyBelow poverty line genderFEMALE 2.6263831 0.3687033 3.9909627 but here I am lost about the interpretation. For instance, what are the odds of a positive response for those above versus below the poverty line in males? In females? I think that everything is there, but I cannot extract/interpret the info provided by R correctly. Any help is appreciated. Cheers Lorenzo
Lorenzo: Are you "learning the ropes" or "on the ropes"? (Heh, heh -- I couldn't resist that bad bit of English slang. I apologize to all non-native English speakers -- and maybe to the native ones, too. But sometimes my evil inner twin just gets the better of me). However, to try to be helpful. It sounds to me like you have some considerable confusion about the basics of GLM's. I don't think this is the right forum to resolve that confusion. I think you either need to do some more studying on your own or, better yet, consult a local statistical expert, who should be able to help you out fairly quickly. Cheers, Bert On Sun, Dec 30, 2012 at 10:14 AM, Lorenzo Isella <lorenzo.isella@gmail.com>wrote:> Dear All, > I am learning the ropes about logistic regression in R. > I found some interesting examples > > http://bit.ly/Vq4GgX > http://bit.ly/W9fUTg > http://bit.ly/UfK73e > > but I am a bit lost. > I have several questions. > 1) For instance, what is the difference between > > glm.out = glm(response ~ poverty + gender, family=binomial(logit), > data=mydata) > > and > > glm.out = glm(response ~ poverty * gender, family=binomial(logit), > data=mydata) > ? Which begs the question when I should use the "*" or "+" sign when doing > a logistic regression on several explanatory variables. I think that in the > former case I am allowing for an interaction between poverty and gender, > but I would like to be sure about it. > > 2) Consider the following snippet > > > glm.out = glm(response ~ poverty + gender, family=binomial(logit), > data=mydata) > > where "response" is a dichotomous variable, poverty assumes only two > values (Above poverty line and Below poverty line) and gender assumes only > the Male or Female values. > The command above leads to the following output > ##############################**######### > print(summary(glm.out)) > Call: > glm(formula = response ~ poverty + gender, family = binomial(logit), > data = mydata) > > Deviance Residuals: > Min 1Q Median 3Q Max > -2.2094 0.4269 0.4269 0.8033 1.1911 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 0.9656 0.1477 6.538 6.25e-11 *** > povertyBelow poverty line -0.9978 0.3246 -3.074 0.00211 ** > genderFEMALE 1.3840 0.2549 5.429 5.68e-08 *** > --- > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 494.81 on 499 degrees of freedom > Residual deviance: 457.13 on 497 degrees of freedom > AIC: 463.13 > > Number of Fisher Scoring iterations: 4 > ##############################**################ > > To calculate then odds ratios, I should do the following > > exp(coef(glm.out)) > (Intercept) povertyBelow poverty line > genderFEMALE > 2.6263831 0.3687033 > 3.9909627 > > but here I am lost about the interpretation. For instance, what are the > odds of a positive response for those above versus below the poverty line > in males? In females? > > I think that everything is there, but I cannot extract/interpret the info > provided by R correctly. > Any help is appreciated. > Cheers > > Lorenzo > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]]
Took some googling, but it was worth it! :) Best, Simone Sent from my iPhone. Please excuse brevity and odd typos Il giorno 30/dic/2012, alle ore 19:44, Bert Gunter <gunter.berton at gene.com> ha scritto:> Are you "learning the ropes" or "on the ropes"?
At 18:14 30/12/2012, Lorenzo Isella wrote:>Dear All, >I am learning the ropes about logistic regression in R. >I found some interesting examples > >http://bit.ly/Vq4GgX >http://bit.ly/W9fUTg >http://bit.ly/UfK73e > >but I am a bit lost. >I have several questions. >1) For instance, what is the difference between > >glm.out = glm(response ~ poverty + gender, family=binomial(logit), > data=mydata) > >and > >glm.out = glm(response ~ poverty * gender, family=binomial(logit), > data=mydata) >? Which begs the question when I should use the "*" or "+" sign when doing >a logistic regression on several explanatory variables. I think that in >the former case I am allowing for an interaction between poverty and >gender, but I would like to be sure about it.I think you need to (re)-read any introductory text on R, in particular about the use of formulae. The asterisk implies an interaction. This also answers your second question I think.>2) Consider the following snippet > > >glm.out = glm(response ~ poverty + gender, family=binomial(logit), > data=mydata) > >where "response" is a dichotomous variable, poverty assumes only two >values (Above poverty line and Below poverty line) and gender assumes only >the Male or Female values. >The command above leads to the following output >####################################### >print(summary(glm.out)) >Call: >glm(formula = response ~ poverty + gender, family = binomial(logit), > data = mydata) > >Deviance Residuals: > Min 1Q Median 3Q Max >-2.2094 0.4269 0.4269 0.8033 1.1911 > >Coefficients: > Estimate Std. Error z value Pr(>|z|) >(Intercept) 0.9656 0.1477 6.538 6.25e-11 *** >povertyBelow poverty line -0.9978 0.3246 -3.074 0.00211 ** >genderFEMALE 1.3840 0.2549 5.429 5.68e-08 *** >--- >Signif. codes: 0 ???***??? 0.001 ???**??? 0.01 >???*??? 0.05 ???.??? 0.1 ??? ??? 1 > >(Dispersion parameter for binomial family taken to be 1) > > Null deviance: 494.81 on 499 degrees of freedom >Residual deviance: 457.13 on 497 degrees of freedom >AIC: 463.13 > >Number of Fisher Scoring iterations: 4 >############################################## > >To calculate then odds ratios, I should do the following > >exp(coef(glm.out)) > (Intercept) povertyBelow poverty line >genderFEMALE > 2.6263831 0.3687033 >3.9909627 > >but here I am lost about the interpretation. For instance, what are the >odds of a positive response for those above versus below the poverty line >in males? In females? > >I think that everything is there, but I cannot extract/interpret the info >provided by R correctly. >Any help is appreciated. >Cheers > >Lorenzo > >Michael Dewey info at aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html