On 06 Feb 2015, at 16:58 , Michael Dewey <info at aghmed.fsnet.co.uk> wrote:> Dear Mohamed > > Your dataset did not make it through, the list strips most attachments. > > In my area of application I would be suspicious that such an odds ratio was the result of a data error or my misunderstanding of the underlying science. You are probably in the best position to judge both of these in your area. >If both variables are binary, a table would be informative: with(divs, table(Div, PRFD)) The output is roughly consistent with odds 1:30 if PRFD==0 and 3:1 if PRFD==1. That sounds extreme, but not entirely implausible, depending on field of application.> Michael > > > On 06/02/2015 07:42, Mohamed Farah wrote: >> I have run a logit regression with two categorical variables (with 0 and 1) as the values. i.e. payment (1) / non-payment(0) on profit (profitable =1, non-profitable=0) on 375 entities. Here is the result from R: >> >> >> >>> divgress <-glm(Div~PRFD, family=binomial(link="logit"), data=divs) >>> summary(divgress) >> >> Call: >> glm(formula = Div ~ PRFD, family = binomial(link = "logit"), >> data = divs) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -1.6765 -0.2626 0.7502 0.7502 2.6017 >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -3.3499 0.7194 -4.656 3.22e-06 *** >> PRFD 4.4738 0.7311 6.119 9.41e-10 *** >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> (Dispersion parameter for binomial family taken to be 1) >> >> Null deviance: 491.84 on 376 degrees of freedom >> Residual deviance: 371.78 on 375 degrees of freedom >> AIC: 375.78 >> >> Number of Fisher Scoring iterations: 6 >> >> >> >> My question is that the coefficient of the independent variable (log-odds) at 4.4738 is difficult to interpret. I have obtained the exponent of the coefficient below and as the result of 87.69.. shown below shows, the number is high which makes suspicious that there is something not working right. >> >> >> >>> exp(coef(divgress)) >> (Intercept) PRFD >> 0.03508772 87.69230769 >>> >> >> >> >> The dataset is attached. I appreciate your help. >> >> >> >> >> This email communication is confidential and may be privileged or otherwise protected. It is intended exclusively for those individuals and entities addressed above and any other persons who have been specifically authorised to receive it. You should not copy it or disclose its contents to anyone. If you are not such an intended recipient, please notify us that you have received this email in error. Please then delete the email and any attachments and destroy any copies of it. We regret any inconvenience resulting from erroneous delivery of this message and thank you for your cooperation. >> Please note that none of the Supreme Committee for Delivery and Legacy or any of its affiliated entities will have any liability for any incorrect or incomplete transmission of the information contained in this email nor for any delay in its receipt. Emails are not secure and cannot be guaranteed to be error free. Anyone who communicates with us by email is taken to accept these risks. Any views or opinions expressed in this email are solely those of the author and do not necessarily represent those of the Supreme Committee for Delivery and Legacy or any of its affiliated entities. Any logo trademark or other intellectual property forming part of or attached to this email belongs exclusively to the Supreme Committee for Delivery and Legacy. Any unauthorised reproduction copying or other use by you or others is strictly prohibited. >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2015.0.5645 / Virus Database: 4281/9067 - Release Date: 02/06/15 >> >> >> > > -- > Michael > http://www.dewey.myzen.co.uk > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Peter, Appreciate the comment. Here is a summary table. Both variables (Profit and dividend with 0=yes & 1=no) are binary as pointed out. I have out of 368 companies of which 342 were profitable and 26 unprofitable. Of the 342, 79 paid no dividends and 263 paid dividends. Of the 26, 25 paid no dividends and 1 paid dividends. The result is 104 dividend payers and 264 payers. DIV Profit 0 1 Grand Total 0 25 1 26 1 79 263 342 Grand Total 104 264 368 ________________________________________ From: peter dalgaard [pdalgd at gmail.com] Sent: Friday, February 06, 2015 7:35 PM To: Michael Dewey Cc: Mohamed Farah; r-help at r-project.org Subject: Re: [R] Interpreting a Logit regression result On 06 Feb 2015, at 16:58 , Michael Dewey <info at aghmed.fsnet.co.uk> wrote:> Dear Mohamed > > Your dataset did not make it through, the list strips most attachments. > > In my area of application I would be suspicious that such an odds ratio was the result of a data error or my misunderstanding of the underlying science. You are probably in the best position to judge both of these in your area. >If both variables are binary, a table would be informative: with(divs, table(Div, PRFD)) The output is roughly consistent with odds 1:30 if PRFD==0 and 3:1 if PRFD==1. That sounds extreme, but not entirely implausible, depending on field of application.> Michael > > > On 06/02/2015 07:42, Mohamed Farah wrote: >> I have run a logit regression with two categorical variables (with 0 and 1) as the values. i.e. payment (1) / non-payment(0) on profit (profitable =1, non-profitable=0) on 375 entities. Here is the result from R: >> >> >> >>> divgress <-glm(Div~PRFD, family=binomial(link="logit"), data=divs) >>> summary(divgress) >> >> Call: >> glm(formula = Div ~ PRFD, family = binomial(link = "logit"), >> data = divs) >> >> Deviance Residuals: >> Min 1Q Median 3Q Max >> -1.6765 -0.2626 0.7502 0.7502 2.6017 >> >> Coefficients: >> Estimate Std. Error z value Pr(>|z|) >> (Intercept) -3.3499 0.7194 -4.656 3.22e-06 *** >> PRFD 4.4738 0.7311 6.119 9.41e-10 *** >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> (Dispersion parameter for binomial family taken to be 1) >> >> Null deviance: 491.84 on 376 degrees of freedom >> Residual deviance: 371.78 on 375 degrees of freedom >> AIC: 375.78 >> >> Number of Fisher Scoring iterations: 6 >> >> >> >> My question is that the coefficient of the independent variable (log-odds) at 4.4738 is difficult to interpret. I have obtained the exponent of the coefficient below and as the result of 87.69.. shown below shows, the number is high which makes suspicious that there is something not working right. >> >> >> >>> exp(coef(divgress)) >> (Intercept) PRFD >> 0.03508772 87.69230769 >>> >> >> >> >> The dataset is attached. I appreciate your help. >> >> >> >> >> This email communication is confidential and may be privileged or otherwise protected. It is intended exclusively for those individuals and entities addressed above and any other persons who have been specifically authorised to receive it. You should not copy it or disclose its contents to anyone. If you are not such an intended recipient, please notify us that you have received this email in error. Please then delete the email and any attachments and destroy any copies of it. We regret any inconvenience resulting from erroneous delivery of this message and thank you for your cooperation. >> Please note that none of the Supreme Committee for Delivery and Legacy or any of its affiliated entities will have any liability for any incorrect or incomplete transmission of the information contained in this email nor for any delay in its receipt. Emails are not secure and cannot be guaranteed to be error free. Anyone who communicates with us by email is taken to accept these risks. Any views or opinions expressed in this email are solely those of the author and do not necessarily represent those of the Supreme Committee for Delivery and Legacy or any of its affiliated entities. Any logo trademark or other intellectual property forming part of or attached to this email belongs exclusively to the Supreme Committee for Delivery and Legacy. Any unauthorised reproduction copying or other use by you or others is strictly prohibited. >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >> ----- >> No virus found in this message. >> Checked by AVG - www.avg.com >> Version: 2015.0.5645 / Virus Database: 4281/9067 - Release Date: 02/06/15 >> >> >> > > -- > Michael > http://www.dewey.myzen.co.uk > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com This email communication is confidential and may be privileged or otherwise protected. It is intended exclusively for those individuals and entities addressed above and any other persons who have been specifically authorised to receive it. You should not copy it or disclose its contents to anyone. If you are not such an intended recipient, please notify us that you have received this email in error. Please then delete the email and any attachments and destroy any copies of it. We regret any inconvenience resulting from erroneous delivery of this message and thank you for your cooperation. Please note that none of the Supreme Committee for Delivery and Legacy or any of its affiliated entities will have any liability for any incorrect or incomplete transmission of the information contained in this email nor for any delay in its receipt. Emails are not secure and cannot be guaranteed to be error free. Anyone who communicates with us by email is taken to accept these risks. Any views or opinions expressed in this email are solely those of the author and do not necessarily represent those of the Supreme Committee for Delivery and Legacy or any of its affiliated entities. Any logo trademark or other intellectual property forming part of or attached to this email belongs exclusively to the Supreme Committee for Delivery and Legacy. Any unauthorised reproduction copying or other use by you or others is strictly prohibited. [[alternative HTML version deleted]]
> On 06 Feb 2015, at 18:25 , Mohamed Farah <m.farah at sc.qa> wrote: > > Peter, > > Appreciate the comment. Here is a summary table. Both variables (Profit and dividend with 0=yes & 1=no) are binary as pointed out. I have out of 368 companies of which 342 were profitable and 26 unprofitable. Of the 342, 79 paid no dividends and 263 paid dividends. Of the 26, 25 paid no dividends and 1 paid dividends. The result is 104 dividend payers and 264 payers. > > > DIV > Profit > 0 1 Grand Total > 0 25 1 26 > 1 79 263 342 > Grand Total 104 264 368This doesn't quite fit with the 376 df for the null deviance in your glm(). However, the OR for that table is 25*263/(79*1) = 83.23, which isn't far off. However, something is strange. Your glm() output had odds for one group at 0.03508772, but> 1/29[1] 0.03448276> 1/28[1] 0.03571429 However> 2/57[1] 0.03508772 and the odds for the other group should be> .03508772*87.69230769[1] 3.076923 which is pretty much exactly 40/13> 40/13[1] 3.076923 Now, to fit the 376 null df, I'd expect 377 obs total. That fits if the other group is actually 240:78, so that the entire table is 57 2 78 240 And, lo and behold:> x <- rep(0:1, c(59,318)) > y <- rep(c(0,1,0,1),c(57,2,78,240)) > summary(glm(y~x, binomial))Call: glm(formula = y ~ x, family = binomial) Deviance Residuals: Min 1Q Median 3Q Max -1.6765 -0.2626 0.7502 0.7502 2.6017 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.3499 0.7194 -4.656 3.22e-06 *** x 4.4738 0.7311 6.119 9.41e-10 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 491.84 on 376 degrees of freedom Residual deviance: 371.78 on 375 degrees of freedom AIC: 375.78 -pd> > > > > ________________________________________ > From: peter dalgaard [pdalgd at gmail.com] > Sent: Friday, February 06, 2015 7:35 PM > To: Michael Dewey > Cc: Mohamed Farah; r-help at r-project.org > Subject: Re: [R] Interpreting a Logit regression result > > On 06 Feb 2015, at 16:58 , Michael Dewey <info at aghmed.fsnet.co.uk> wrote: > > > Dear Mohamed > > > > Your dataset did not make it through, the list strips most attachments. > > > > In my area of application I would be suspicious that such an odds ratio was the result of a data error or my misunderstanding of the underlying science. You are probably in the best position to judge both of these in your area. > > > > If both variables are binary, a table would be informative: > > with(divs, table(Div, PRFD)) > > The output is roughly consistent with odds 1:30 if PRFD==0 and 3:1 if PRFD==1. That sounds extreme, but not entirely implausible, depending on field of application. > > > > Michael > > > > > > On 06/02/2015 07:42, Mohamed Farah wrote: > >> I have run a logit regression with two categorical variables (with 0 and 1) as the values. i.e. payment (1) / non-payment(0) on profit (profitable =1, non-profitable=0) on 375 entities. Here is the result from R: > >> > >> > >> > >>> divgress <-glm(Div~PRFD, family=binomial(link="logit"), data=divs) > >>> summary(divgress) > >> > >> Call: > >> glm(formula = Div ~ PRFD, family = binomial(link = "logit"), > >> data = divs) > >> > >> Deviance Residuals: > >> Min 1Q Median 3Q Max > >> -1.6765 -0.2626 0.7502 0.7502 2.6017 > >> > >> Coefficients: > >> Estimate Std. Error z value Pr(>|z|) > >> (Intercept) -3.3499 0.7194 -4.656 3.22e-06 *** > >> PRFD 4.4738 0.7311 6.119 9.41e-10 *** > >> --- > >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > >> > >> (Dispersion parameter for binomial family taken to be 1) > >> > >> Null deviance: 491.84 on 376 degrees of freedom > >> Residual deviance: 371.78 on 375 degrees of freedom > >> AIC: 375.78 > >> > >> Number of Fisher Scoring iterations: 6 > >> > >> > >> > >> My question is that the coefficient of the independent variable (log-odds) at 4.4738 is difficult to interpret. I have obtained the exponent of the coefficient below and as the result of 87.69.. shown below shows, the number is high which makes suspicious that there is something not working right. > >> > >> > >> > >>> exp(coef(divgress)) > >> (Intercept) PRFD > >> 0.03508772 87.69230769 > >>> > >> > >> > >> > >> The dataset is attached. I appreciate your help. > >> > >> > >> > >> > >> This email communication is confidential and may be privileged or otherwise protected. It is intended exclusively for those individuals and entities addressed above and any other persons who have been specifically authorised to receive it. You should not copy it or disclose its contents to anyone. If you are not such an intended recipient, please notify us that you have received this email in error. Please then delete the email and any attachments and destroy any copies of it. We regret any inconvenience resulting from erroneous delivery of this message and thank you for your cooperation. > >> Please note that none of the Supreme Committee for Delivery and Legacy or any of its affiliated entities will have any liability for any incorrect or incomplete transmission of the information contained in this email nor for any delay in its receipt. Emails are not secure and cannot be guaranteed to be error free. Anyone who communicates with us by email is taken to accept these risks. Any views or opinions expressed in this email are solely those of the author and do not necessarily represent those of the Supreme Committee for Delivery and Legacy or any of its affiliated entities. Any logo trademark or other intellectual property forming part of or attached to this email belongs exclusively to the Supreme Committee for Delivery and Legacy. Any unauthorised reproduction copying or other use by you or others is strictly prohibited. > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > >> > >> ----- > >> No virus found in this message. > >> Checked by AVG - www.avg.com > >> Version: 2015.0.5645 / Virus Database: 4281/9067 - Release Date: 02/06/15 > >> > >> > >> > > > > -- > > Michael > > http://www.dewey.myzen.co.uk > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > This email communication is confidential and may be privileged or otherwise protected. It is intended exclusively for those individuals and entities addressed above and any other persons who have been specifically authorised to receive it. You should not copy it or disclose its contents to anyone. If you are not such an intended recipient, please notify us that you have received this email in error. Please then delete the email and any attachments and destroy any copies of it. We regret any inconvenience resulting from erroneous delivery of this message and thank you for your cooperation. > Please note that none of the Supreme Committee for Delivery and Legacy or any of its affiliated entities will have any liability for any incorrect or incomplete transmission of the information contained in this email nor for any delay in its receipt. Emails are not secure and cannot be guaranteed to be error free. Anyone who communicates with us by email is taken to accept these risks. Any views or opinions expressed in this email are solely those of the author and do not necessarily represent those of the Supreme Committee for Delivery and Legacy or any of its affiliated entities. Any logo trademark or other intellectual property forming part of or attached to this email belongs exclusively to the Supreme Committee for Delivery and Legacy. Any unauthorised reproduction copying or other use by you or others is strictly prohibited.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com