Paul Johnson
2004-Sep-30 21:40 UTC
[R] polr (MASS) and lrm (Design) differences in tests of statistical signifcance
Greetings:
I'm running R-1.9.1 on Fedora Core 2 Linux.
I tested a proportional odds logistic regression with MASS's polr and
Design's lrm. Parameter estimates between the 2 are consistent, but the
standard errors are quite different, and the conclusions from the t and
Wald tests are dramatically different. I cranked the "abstol" argument
up quite a bit in the polr method and it did not make the differences go
away.
So
1. Can you help me see why the std. errors in the polr are so much
smaller, and
2. Can I hear more opinions on the question of t vs. Wald in making
these signif tests. So far, I understand the t is based on the
asymptotic Normality of the estimate of b, and for finite samples b/se
is not exactly distributed as a t. But I also had the impression that
the Wald value was an approximation as well.
> summary(polr(as.factor(RENUCYC) ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH
+ FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1))
Re-fitting to get Hessian
Call:
polr(formula = as.factor(RENUCYC) ~ DOCS + PCT65PLS * RANNEY2 +
OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1)
Coefficients:
Value Std. Error t value
DOCS 0.004942217 0.002952001 1.674192
PCT65PLS 0.454638558 0.113504288 4.005475
RANNEY2 0.110473483 0.010829826 10.200855
OLDCRASH 0.139808663 0.042245692 3.309418
FISCAL2 0.025592117 0.011465812 2.232037
PCTMETRO 0.018184093 0.007792680 2.333484
ADMLICEN -0.028490387 0.011470999 -2.483688
PCT65PLS:RANNEY2 -0.008559228 0.001456543 -5.876400
Intercepts:
Value Std. Error t value
2|3 6.6177 0.3019 21.9216
3|4 7.1524 0.2773 25.7938
4|5 10.5856 0.2149 49.2691
5|6 12.2132 0.1858 65.7424
6|8 12.2704 0.1856 66.1063
8|10 13.0345 0.2184 59.6707
10|12 13.9801 0.3517 39.7519
12|18 14.6806 0.5587 26.2782
Residual Deviance: 587.0995
AIC: 619.0995
> lrm(RENUCYC ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH + FISCAL2 +
PCTMETRO + ADMLICEN, data=elaine1)
Logistic Regression Model
lrm(formula = RENUCYC ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH +
FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1)
Frequencies of Responses
2 3 4 5 6 8 10 12 18
21 12 149 46 1 10 6 2 2
Frequencies of Missing Values Due to Each Variable
RENUCYC DOCS PCT65PLS RANNEY2 OLDCRASH FISCAL2 PCTMETRO ADMLICEN
5 0 0 6 0 5 0 5
Obs Max Deriv Model L.R. d.f. P C
Dxy
249 7e-05 56.58 8 0 0.733
0.465
Gamma Tau-a R2 Brier
0.47 0.278 0.22 0.073
Coef S.E. Wald Z P
y>=3 -6.617857 6.716688 -0.99 0.3245
y>=4 -7.152561 6.716571 -1.06 0.2869
y>=5 -10.585705 6.742222 -1.57 0.1164
y>=6 -12.213340 6.755656 -1.81 0.0706
y>=8 -12.270506 6.755571 -1.82 0.0693
y>=10 -13.034584 6.756829 -1.93 0.0537
y>=12 -13.980235 6.767724 -2.07 0.0389
y>=18 -14.680760 6.786639 -2.16 0.0305
DOCS 0.004942 0.002932 1.69 0.0918
PCT65PLS 0.454653 0.552430 0.82 0.4105
RANNEY2 0.110475 0.076438 1.45 0.1484
OLDCRASH 0.139805 0.042104 3.32 0.0009
FISCAL2 0.025592 0.011374 2.25 0.0245
PCTMETRO 0.018184 0.007823 2.32 0.0201
ADMLICEN -0.028490 0.011576 -2.46 0.0138
PCT65PLS * RANNEY2 -0.008559 0.006417 -1.33 0.1822
>
--
Paul E. Johnson email: pauljohn at ku.edu
Dept. of Political Science http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504
University of Kansas Office: (785) 864-9086
Lawrence, Kansas 66044-3177 FAX: (785) 864-5700
John Fox
2004-Oct-01 00:34 UTC
[R] polr (MASS) and lrm (Design) differences in tests of statistical signifcance
Dear Paul, I tried polr() and lrm() on a different problem and (except for the difference in signs for the cut-points/intercepts) got identical results for both coefficients and standard errors. There might be something ill-conditioned about your problem that produces the discrepancy -- I noticed, for example, that some of the upper categories of the response are very sparse. Perhaps the two functions use different forms of the information matrix. I expect that someone else will be able to supply more details. I believe that the t-statistics in the polr() output are actually Wald statistics. I hope this helps, John> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Paul Johnson > Sent: Thursday, September 30, 2004 4:41 PM > To: r help > Subject: [R] polr (MASS) and lrm (Design) differences in > tests of statistical signifcance > > Greetings: > > I'm running R-1.9.1 on Fedora Core 2 Linux. > > I tested a proportional odds logistic regression with MASS's > polr and Design's lrm. Parameter estimates between the 2 are > consistent, but the standard errors are quite different, and > the conclusions from the t and Wald tests are dramatically > different. I cranked the "abstol" argument up quite a bit in > the polr method and it did not make the differences go away. > > So > > 1. Can you help me see why the std. errors in the polr are so > much smaller, and > > 2. Can I hear more opinions on the question of t vs. Wald in > making these signif tests. So far, I understand the t is > based on the asymptotic Normality of the estimate of b, and > for finite samples b/se is not exactly distributed as a t. > But I also had the impression that the Wald value was an > approximation as well. > > > summary(polr(as.factor(RENUCYC) ~ DOCS + PCT65PLS*RANNEY2 > + OLDCRASH > + FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1)) > > Re-fitting to get Hessian > > Call: > polr(formula = as.factor(RENUCYC) ~ DOCS + PCT65PLS * RANNEY2 + > OLDCRASH + FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) > > Coefficients: > Value Std. Error t value > DOCS 0.004942217 0.002952001 1.674192 > PCT65PLS 0.454638558 0.113504288 4.005475 > RANNEY2 0.110473483 0.010829826 10.200855 > OLDCRASH 0.139808663 0.042245692 3.309418 > FISCAL2 0.025592117 0.011465812 2.232037 > PCTMETRO 0.018184093 0.007792680 2.333484 > ADMLICEN -0.028490387 0.011470999 -2.483688 > PCT65PLS:RANNEY2 -0.008559228 0.001456543 -5.876400 > > Intercepts: > Value Std. Error t value > 2|3 6.6177 0.3019 21.9216 > 3|4 7.1524 0.2773 25.7938 > 4|5 10.5856 0.2149 49.2691 > 5|6 12.2132 0.1858 65.7424 > 6|8 12.2704 0.1856 66.1063 > 8|10 13.0345 0.2184 59.6707 > 10|12 13.9801 0.3517 39.7519 > 12|18 14.6806 0.5587 26.2782 > > Residual Deviance: 587.0995 > AIC: 619.0995 > > > > lrm(RENUCYC ~ DOCS + PCT65PLS*RANNEY2 + OLDCRASH + > FISCAL2 + PCTMETRO + ADMLICEN, data=elaine1) > > Logistic Regression Model > > lrm(formula = RENUCYC ~ DOCS + PCT65PLS * RANNEY2 + OLDCRASH + > FISCAL2 + PCTMETRO + ADMLICEN, data = elaine1) > > > Frequencies of Responses > 2 3 4 5 6 8 10 12 18 > 21 12 149 46 1 10 6 2 2 > > Frequencies of Missing Values Due to Each Variable > RENUCYC DOCS PCT65PLS RANNEY2 OLDCRASH FISCAL2 > PCTMETRO ADMLICEN > 5 0 0 6 0 5 > 0 5 > > Obs Max Deriv Model L.R. d.f. P C > Dxy > 249 7e-05 56.58 8 0 0.733 > 0.465 > Gamma Tau-a R2 Brier > 0.47 0.278 0.22 0.073 > > Coef S.E. Wald Z P > y>=3 -6.617857 6.716688 -0.99 0.3245 > y>=4 -7.152561 6.716571 -1.06 0.2869 > y>=5 -10.585705 6.742222 -1.57 0.1164 > y>=6 -12.213340 6.755656 -1.81 0.0706 > y>=8 -12.270506 6.755571 -1.82 0.0693 > y>=10 -13.034584 6.756829 -1.93 0.0537 > y>=12 -13.980235 6.767724 -2.07 0.0389 > y>=18 -14.680760 6.786639 -2.16 0.0305 > DOCS 0.004942 0.002932 1.69 0.0918 > PCT65PLS 0.454653 0.552430 0.82 0.4105 > RANNEY2 0.110475 0.076438 1.45 0.1484 > OLDCRASH 0.139805 0.042104 3.32 0.0009 > FISCAL2 0.025592 0.011374 2.25 0.0245 > PCTMETRO 0.018184 0.007823 2.32 0.0201 > ADMLICEN -0.028490 0.011576 -2.46 0.0138 > PCT65PLS * RANNEY2 -0.008559 0.006417 -1.33 0.1822 > > > > > -- > Paul E. Johnson email: pauljohn at ku.edu > Dept. of Political Science http://lark.cc.ku.edu/~pauljohn > 1541 Lilac Lane, Rm 504 > University of Kansas Office: (785) 864-9086 > Lawrence, Kansas 66044-3177 FAX: (785) 864-5700 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html