Dear All,
I have been struggling to understand why for the housing data in MASS
library R and stata give coef. estimates that are really different. I also
tried to come up with many many examples myself (see below, of course I
did not have the set.seed command included) and all of my
`random' examples seem to give verry similar output. For the housing data,
I have changed the data into numeric vectors instead of factors/ordered
factors. I did so to try and get the same results as in STATA and to have
the housing example as close as possible to the one I constructed.
I run a debian sid, kernel 2.4, R 2.0.0, and STATA version 8.2, MASS
version 7.2-8.
here's the example ( I assume that you have STATA installed and can run in
batch mode, if not the output is also given below)
library(MASS)
library(foreign)
set.seed(100)
X <- rnorm(1000)
X1 <- rnorm(1000)
X2 <- rnorm(1000)
X <- X +X1+X2
XX <- X<=quantile(X, .25)
XX[X>quantile(X, .25) & X<=quantile(X, .50)] <- 2
XX[X>quantile(X, .5) & X<=quantile(X, .75)] <- 3
XX[X>quantile(X, .75)] <- 4
temp <- data.frame(XX=XX, X1=X1, X2=X2, X=X)
summary(polr(factor(XX)~X1 +X2, data=temp, method="probit"))
write.dta(temp, "temp.dta")
####################################
#Stata stuff
####################################
cat("use temp.dta\n oprobit XX X1 X2\n", file="temp.ado")
system("stata -b do temp.ado&")
system("cat temp.log")
#
##### here's R's output
#############################
Re-fitting to get Hessian
Call:
polr(formula = factor(XX) ~ X1 + X2, data = temp, method = "probit")
Coefficients:
Value Std. Error t value
X1 0.9891735 0.04749225 20.82811
X2 0.9400804 0.04527653 20.76309
Intercepts:
Value Std. Error t value
1|2 -1.1411 0.0572 -19.9613
2|3 -0.0372 0.0486 -0.7656
3|4 1.1101 0.0579 19.1865
Residual Deviance: 1969.734
AIC: 1979.734
##############################################
#and here Stata's output
##############################################
Ordered probit estimates Number of obs 1000
LR chi2(2) 802.86
Prob > chi2 0.0000
Log likelihood = -984.86675 Pseudo R2 0.2896
------------------------------------------------------------------------------
XX | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
X1 | .9891651 .0474922 20.83 0.000 .8960822 1.082248
X2 | .94007 .0452765 20.76 0.000 .8513298 1.02881
-------------+----------------------------------------------------------------
_cut1 | -1.141119 .0571667 (Ancillary parameters)
_cut2 | -.0371779 .0485592
_cut3 | 1.110117 .0578593
On Wed, 10 Nov 2004, Jean Eid wrote:> Dear All, > I have been struggling to understand why for the housing data in MASS > library R and stata give coef. estimates that are really different. I also > tried to come up with many many examples myself (see below, of course I > did not have the set.seed command included) and all of my > `random' examples seem to give verry similar output. For the housing data, > I have changed the data into numeric vectors instead of factors/ordered > factors. I did so to try and get the same results as in STATA and to have > the housing example as close as possible to the one I constructed. > > I run a debian sid, kernel 2.4, R 2.0.0, and STATA version 8.2, MASS > version 7.2-8. > > > here's the example ( I assume that you have STATA installed and can run in > batch mode, if not the output is also given below) >That example shows the same results with Stata and polr() from MASS. For the housing data, I also get the same coefficients in Stata as with polr(): In R: library(MASS) library(foreign) write.dta(housing, file="housing.dta") house.probit<-polr(Sat ~ Infl + Type + Cont, data = housing, weights = Freq, method = "probit") summary(house.probit) ------------------------- Re-fitting to get Hessian Call: polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq, method = "probit") Coefficients: Value Std. Error t value InflMedium 0.3464233 0.06413706 5.401297 InflHigh 0.7829149 0.07642620 10.244063 TypeApartment -0.3475372 0.07229093 -4.807480 TypeAtrium -0.2178874 0.09476607 -2.299213 TypeTerrace -0.6641737 0.09180004 -7.235005 ContHigh 0.2223862 0.05812267 3.826153 Intercepts: Value Std. Error t value Low|Medium -0.2998 0.0762 -3.9371 Medium|High 0.4267 0.0764 5.5850 Residual Deviance: 3479.689 AIC: 3495.689 ------------------------ In Stata ----------------- . use housing.dta . xi: oprobit Sat i.Infl i.Type i.Cont [fw=Freq] i.Infl _IInfl_1-3 (naturally coded; _IInfl_1 omitted) i.Type _IType_1-4 (naturally coded; _IType_1 omitted) i.Cont _ICont_1-2 (naturally coded; _ICont_1 omitted) Iteration 0: log likelihood = -1824.4388 Iteration 1: log likelihood = -1739.9254 Iteration 2: log likelihood = -1739.8444 Ordered probit estimates Number of obs = 1681 LR chi2(6) = 169.19 Prob > chi2 = 0.0000 Log likelihood = -1739.8444 Pseudo R2 = 0.0464 ------------------------------------------------------------------------------ Sat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _IInfl_2 | .3464228 .064137 5.40 0.000 .2207165 .472129 _IInfl_3 | .7829146 .076426 10.24 0.000 .6331224 .9327069 _IType_2 | -.3475367 .0722908 -4.81 0.000 -.4892241 -.2058493 _IType_3 | -.2178875 .094766 -2.30 0.021 -.4036254 -.0321497 _IType_4 | -.6641735 .0917999 -7.24 0.000 -.844098 -.484249 _ICont_2 | .2223858 .0581226 3.83 0.000 .1084676 .336304 -------------+---------------------------------------------------------------- _cut1 | -.2998279 .0761537 (Ancillary parameters) _cut2 | .4267208 .0764043 ------------------------------------------------------------------------------ -thomas