Paul Johnson
2004-Sep-30 21:40 UTC
[R] polr (MASS) and lrm (Design) differences in tests of statistical signifcance
Greetings: I'm running R-1.9.1 on Fedora Core 2 Linux. I tested a proportional odds logistic regression with MASS's polr and Design's lrm. Parameter estimates between the 2 are consistent, but the standard errors are quite different, and the conclusions from the t and Wald tests are dramatically different. I cranked the "abstol" argument up quite a bit in the polr method and it did not make the differences go away. So 1. Can you help me see why the std. errors in the polr are so much smaller, and 2. Can I hear more opinions on the question of t vs. Wald in making these signif tests. So far, I understand the t is based on the asymptotic Normality of the estimate of b, and for finite samples b/se is not exactly distributed as a t. But I also had the impression that the Wald value was an approximation as well. > summary(polr(as.factor(RENUCYC) ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1)) Re-fitting to get Hessian Call: polr(formula = as.factor(RENUCYC) ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) Coefficients: Value Std. Error t value DOCS 0.004942217 0.002952001 1.674192 PCT65PLS 0.454638558 0.113504288 4.005475 RANNEY2 0.110473483 0.010829826 10.200855 OLDCRASH 0.139808663 0.042245692 3.309418 FISCAL2 0.025592117 0.011465812 2.232037 PCTMETRO 0.018184093 0.007792680 2.333484 ADMLICEN -0.028490387 0.011470999 -2.483688 PCT65PLS:RANNEY2 -0.008559228 0.001456543 -5.876400 Intercepts: Value Std. Error t value 2|3 6.6177 0.3019 21.9216 3|4 7.1524 0.2773 25.7938 4|5 10.5856 0.2149 49.2691 5|6 12.2132 0.1858 65.7424 6|8 12.2704 0.1856 66.1063 8|10 13.0345 0.2184 59.6707 10|12 13.9801 0.3517 39.7519 12|18 14.6806 0.5587 26.2782 Residual Deviance: 587.0995 AIC: 619.0995 > lrm(RENUCYC ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1) Logistic Regression Model lrm(formula = RENUCYC ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) Frequencies of Responses 2 3 4 5 6 8 10 12 18 21 12 149 46 1 10 6 2 2 Frequencies of Missing Values Due to Each Variable RENUCYC DOCS PCT65PLS RANNEY2 OLDCRASH FISCAL2 PCTMETRO ADMLICEN 5 0 0 6 0 5 0 5 Obs Max Deriv Model L.R. d.f. P C Dxy 249 7e-05 56.58 8 0 0.733 0.465 Gamma Tau-a R2 Brier 0.47 0.278 0.22 0.073 Coef S.E. Wald Z P y>=3 -6.617857 6.716688 -0.99 0.3245 y>=4 -7.152561 6.716571 -1.06 0.2869 y>=5 -10.585705 6.742222 -1.57 0.1164 y>=6 -12.213340 6.755656 -1.81 0.0706 y>=8 -12.270506 6.755571 -1.82 0.0693 y>=10 -13.034584 6.756829 -1.93 0.0537 y>=12 -13.980235 6.767724 -2.07 0.0389 y>=18 -14.680760 6.786639 -2.16 0.0305 DOCS 0.004942 0.002932 1.69 0.0918 PCT65PLS 0.454653 0.552430 0.82 0.4105 RANNEY2 0.110475 0.076438 1.45 0.1484 OLDCRASH 0.139805 0.042104 3.32 0.0009 FISCAL2 0.025592 0.011374 2.25 0.0245 PCTMETRO 0.018184 0.007823 2.32 0.0201 ADMLICEN -0.028490 0.011576 -2.46 0.0138 PCT65PLS * RANNEY2 -0.008559 0.006417 -1.33 0.1822 > -- Paul E. Johnson email: pauljohn at ku.edu Dept. of Political Science http://lark.cc.ku.edu/~pauljohn 1541 Lilac Lane, Rm 504 University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66044-3177 FAX: (785) 864-5700
John Fox
2004-Oct-01 00:34 UTC
[R] polr (MASS) and lrm (Design) differences in tests of statistical signifcance
Dear Paul, I tried polr() and lrm() on a different problem and (except for the difference in signs for the cut-points/intercepts) got identical results for both coefficients and standard errors. There might be something ill-conditioned about your problem that produces the discrepancy -- I noticed, for example, that some of the upper categories of the response are very sparse. Perhaps the two functions use different forms of the information matrix. I expect that someone else will be able to supply more details. I believe that the t-statistics in the polr() output are actually Wald statistics. I hope this helps, John> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Johnson > Sent: Thursday, September 30, 2004 4:41 PM > To: r help > Subject: [R] polr (MASS) and lrm (Design) differences in > tests of statistical signifcance > > Greetings: > > I'm running R-1.9.1 on Fedora Core 2 Linux. > > I tested a proportional odds logistic regression with MASS's > polr and Design's lrm. Parameter estimates between the 2 are > consistent, but the standard errors are quite different, and > the conclusions from the t and Wald tests are dramatically > different. I cranked the "abstol" argument up quite a bit in > the polr method and it did not make the differences go away. > > So > > 1. Can you help me see why the std. errors in the polr are so > much smaller, and > > 2. Can I hear more opinions on the question of t vs. Wald in > making these signif tests. So far, I understand the t is > based on the asymptotic Normality of the estimate of b, and > for finite samples b/se is not exactly distributed as a t. > But I also had the impression that the Wald value was an > approximation as well. > > > summary(polr(as.factor(RENUCYC) ~ DOCS + PCT65PLS*RANNEY2 > + OLDCRASH > + FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1)) > > Re-fitting to get Hessian > > Call: > polr(formula = as.factor(RENUCYC) ~ DOCS + PCT65PLS * RANNEY2 + > OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) > > Coefficients: > Value Std. Error t value > DOCS 0.004942217 0.002952001 1.674192 > PCT65PLS 0.454638558 0.113504288 4.005475 > RANNEY2 0.110473483 0.010829826 10.200855 > OLDCRASH 0.139808663 0.042245692 3.309418 > FISCAL2 0.025592117 0.011465812 2.232037 > PCTMETRO 0.018184093 0.007792680 2.333484 > ADMLICEN -0.028490387 0.011470999 -2.483688 > PCT65PLS:RANNEY2 -0.008559228 0.001456543 -5.876400 > > Intercepts: > Value Std. Error t value > 2|3 6.6177 0.3019 21.9216 > 3|4 7.1524 0.2773 25.7938 > 4|5 10.5856 0.2149 49.2691 > 5|6 12.2132 0.1858 65.7424 > 6|8 12.2704 0.1856 66.1063 > 8|10 13.0345 0.2184 59.6707 > 10|12 13.9801 0.3517 39.7519 > 12|18 14.6806 0.5587 26.2782 > > Residual Deviance: 587.0995 > AIC: 619.0995 > > > > lrm(RENUCYC ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH + > FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1) > > Logistic Regression Model > > lrm(formula = RENUCYC ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH + > FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) > > > Frequencies of Responses > 2 3 4 5 6 8 10 12 18 > 21 12 149 46 1 10 6 2 2 > > Frequencies of Missing Values Due to Each Variable > RENUCYC DOCS PCT65PLS RANNEY2 OLDCRASH FISCAL2 > PCTMETRO ADMLICEN > 5 0 0 6 0 5 > 0 5 > > Obs Max Deriv Model L.R. d.f. P C > Dxy > 249 7e-05 56.58 8 0 0.733 > 0.465 > Gamma Tau-a R2 Brier > 0.47 0.278 0.22 0.073 > > Coef S.E. Wald Z P > y>=3 -6.617857 6.716688 -0.99 0.3245 > y>=4 -7.152561 6.716571 -1.06 0.2869 > y>=5 -10.585705 6.742222 -1.57 0.1164 > y>=6 -12.213340 6.755656 -1.81 0.0706 > y>=8 -12.270506 6.755571 -1.82 0.0693 > y>=10 -13.034584 6.756829 -1.93 0.0537 > y>=12 -13.980235 6.767724 -2.07 0.0389 > y>=18 -14.680760 6.786639 -2.16 0.0305 > DOCS 0.004942 0.002932 1.69 0.0918 > PCT65PLS 0.454653 0.552430 0.82 0.4105 > RANNEY2 0.110475 0.076438 1.45 0.1484 > OLDCRASH 0.139805 0.042104 3.32 0.0009 > FISCAL2 0.025592 0.011374 2.25 0.0245 > PCTMETRO 0.018184 0.007823 2.32 0.0201 > ADMLICEN -0.028490 0.011576 -2.46 0.0138 > PCT65PLS * RANNEY2 -0.008559 0.006417 -1.33 0.1822 > > > > > -- > Paul E. Johnson email: pauljohn at ku.edu > Dept. of Political Science http://lark.cc.ku.edu/~pauljohn > 1541 Lilac Lane, Rm 504 > University of Kansas Office: (785) 864-9086 > Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html