wim nursal
2011-Dec-12 10:46 UTC
[R] calculating logit parameters (odd ratio is exactly one or zero)
Dear statistician experts, Sorry if this is a trivial question, or the old same question (i don't know what is the efficient key word for this issue). In order to understand the calculation of parameter of logistic regression, I did an exercise through spreadsheet following the procedural example from a literature, or the available spreadsheet (with calculation formula). I ended up with infinity (divided by zero) when the odd ratio is exactly 1 (FD=12) or invalid number when odd ratio is zero (MFD = 0) after log. I am wondering how R through GLM function (particularly logit or logistic regression) treats the odds ratios or log odd ratios that is exatcly one or zeros. The sample data is like this: #HH Fsize FD 1 1.29472 0 2 1.6184 0 3 2.4276 1 4 2.4276 2 5 20.23 2 6 1.6184 3 7 1.820 3 8 0.4046 3 9 6.069 4 10 2.6299 4 11 0.72828 5 12 2.4276 5 13 6.069 7 14 4.8552 7 15 2.32645 7 16 1.6184 8 17 1.0115 8 18 1.0115 8 19 5.2598 9 20 2.023 10 21 0.6069 10 22 1.2138 11 23 0.8092 11 24 1.4161 11 25 0.6069 11 26 3.440 11 27 1.2138 12 28 1.2138 12 29 0.4046 12 30 1.2138 12 Fsize is the farm size (acre or hectare). Food deficit (FD) is the number of months (last year from the survey took place) that an household had bought food-grains (minimum = 0 month, maximum = 12 months or whole year deficit). Even though I "jitter"-ed the minimum or maximum FD value only (eg. FD=0+1e-6 or FD=12-1e-6), nothing changed to the result. The formula I used is like this: -------------------------------------------------------------- glm(FD ~ Fsize, data = subFS) -- Coefficients: (Intercept) Fsize 7.7913 -0.3092 Degrees of Freedom: 29 Total (i.e. Null); 28 Residual Null Deviance: 463 Residual Deviance: 425.5 AIC: 170.7 -------------------------------------------------------------- I appreciate for any clarification. Best wishes, Wim [[alternative HTML version deleted]]
Uwe Ligges
2011-Dec-12 20:51 UTC
[R] calculating logit parameters (odd ratio is exactly one or zero)
1. The formula you used is not for a logistic but an ordinal regression (since you are using the default gaussian family rather than family="binomial" or whatever. 2. R (nor any other software) can deal with perfect separation (nor quasi-separation) of classes, since the problem is not well defined in such a case as you found out already. R will give a warning in that case, that the Fisher Scoring does not converge. LDA will give perfect results in such a case (well, unless the within class covariance matrix is singular). Best, Uwe Ligges On 12.12.2011 11:46, wim nursal wrote:> Dear statistician experts, > > Sorry if this is a trivial question, or the old same question (i don't know > what is the efficient key word for this issue). > In order to understand the calculation of parameter of logistic regression, > I did an exercise through spreadsheet following the procedural example > from a literature, or the available spreadsheet (with calculation formula). > I ended up with infinity (divided by zero) when the odd ratio is exactly 1 > (FD=12) or invalid number when odd ratio is zero (MFD = 0) after log. > I am wondering how R through GLM function (particularly logit or logistic > regression) treats the odds ratios or log odd ratios that is exatcly one or > zeros. > > The sample data is like this: > #HH Fsize FD > 1 1.29472 0 > 2 1.6184 0 > 3 2.4276 1 > 4 2.4276 2 > 5 20.23 2 > 6 1.6184 3 > 7 1.820 3 > 8 0.4046 3 > 9 6.069 4 > 10 2.6299 4 > 11 0.72828 5 > 12 2.4276 5 > 13 6.069 7 > 14 4.8552 7 > 15 2.32645 7 > 16 1.6184 8 > 17 1.0115 8 > 18 1.0115 8 > 19 5.2598 9 > 20 2.023 10 > 21 0.6069 10 > 22 1.2138 11 > 23 0.8092 11 > 24 1.4161 11 > 25 0.6069 11 > 26 3.440 11 > 27 1.2138 12 > 28 1.2138 12 > 29 0.4046 12 > 30 1.2138 12 > > Fsize is the farm size (acre or hectare). Food deficit (FD) is the number > of months (last year from the survey took place) that an household had > bought food-grains (minimum = 0 month, maximum = 12 months or whole year > deficit). > Even though I "jitter"-ed the minimum or maximum FD value only (eg. > FD=0+1e-6 or FD=12-1e-6), nothing changed to the result. > > The formula I used is like this: > -------------------------------------------------------------- > glm(FD ~ Fsize, data = subFS) > -- > Coefficients: > (Intercept) Fsize > 7.7913 -0.3092 > > Degrees of Freedom: 29 Total (i.e. Null); 28 Residual > Null Deviance: 463 > Residual Deviance: 425.5 AIC: 170.7 > -------------------------------------------------------------- > > I appreciate for any clarification. > > Best wishes, > Wim > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.