Atul Malik
2007-Oct-05 05:39 UTC
[R] discrepancy in the result of R and SAS on same data in logistics regression
Dear Members, Greetings! I have come across a discrepancy shown by R and SAS results on same data for logistics regression.. When I processed the above csv file(1000.csv) for predicting the Action (i/c) by Age Group(1-7,Na) and Gender(M,F,Na) with GLM of R I get: R result Call: glm(formula = Action ~ Gender + AgeGroup, family = binomial, data = mydata1, na.action = na.pass) Deviance Residuals: Min 1Q Median 3Q Max -1.828 -0.973 -0.709 1.087 1.734 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.2939 0.3180 4.069 4.73e-05 *** GenderM -0.8794 0.1637 -5.371 7.85e-08 *** GenderNa -1.4407 0.2749 -5.240 1.60e-07 *** AgeGroup2 -1.2053 0.3971 -3.035 0.00240 ** AgeGroup3 -1.6670 0.3262 -5.110 3.21e-07 *** AgeGroup4 -1.0786 0.3714 -2.904 0.00368 ** AgeGroup5 -0.8232 0.3829 -2.150 0.03156 * AgeGroup6 0.1682 0.3501 0.481 0.63081 AgeGroup7 -0.3361 0.3617 -0.929 0.35281 AgeGroupNa -1.7956 0.3433 -5.231 1.69e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1342.7 on 999 degrees of freedom Residual deviance: 1213.2 on 990 degrees of freedom AIC: 1233.2 Number of Fisher Scoring iterations: 4 where as SAS gives on same data: Analysis of Maximum Likelihood Estimates Parameter Action DF Estimate Standard Error Wald Chi-Square Pr > ChiSq Intercept c 1 0.3217 0.0953 11.4025 0.0007 AgeGroup 2 c 1 0.3631 0.2434 2.2260 0.1357 AgeGroup 3 c 1 0.8248 0.1411 34.1508 <.0001 AgeGroup 4 c 1 0.2364 0.2146 1.2136 0.2706 AgeGroup 5 c 1 -0.0190 0.2299 0.0068 0.9343 AgeGroup 6 c 1 -1.0104 0.1822 30.7454 <.0001 AgeGroup 7 c 1 -0.5061 0.1974 6.5711 0.0104 AgeGroup Na c 1 0.9534 0.1718 30.7884 <.0001 Gender M c 1 0.1060 0.1103 0.9246 0.3363 Gender N c 1 0.6674 0.1686 15.6744 <.0001 I compared the resultant probabilities of Action "c" on all three packages: R, SAS and StatGraphics and found that R and StatGraphics have same results but SAS has different results for some combinations of AgeGroup and Gender as in attached document for probability of Action. I will appreciate if you can help me sorting out the issue. Thanks and Best Regards Atul Malik StatGraphics results as follows: Estimated Regression Model (Maximum Likelihood) Standard Estimated Parameter Estimate Error Odds Ratio CONSTANT -1.94239 0.298622 AgeGroup=1 1.79555 0.343277 6.02282 AgeGroup=2 0.590229 0.316943 1.8044 AgeGroup=3 0.128605 0.216341 1.13724 AgeGroup=4 0.716996 0.288917 2.04827 AgeGroup=5 0.972326 0.30544 2.64409 AgeGroup=6 1.9638 0.262721 7.12638 AgeGroup=7 1.45945 0.275966 4.3036 Gender=F 1.44072 0.274922 4.22375 Gender=M 0.56134 0.256286 1.75302 Analysis of Deviance Source Deviance Df P-Value Model 129.506 9 0.0000 Residual 1213.21 990 0.0000 Total (corr.) 1342.71 999