Dear All, I have been struggling to understand why for the housing data in MASS library R and stata give coef. estimates that are really different. I also tried to come up with many many examples myself (see below, of course I did not have the set.seed command included) and all of my `random' examples seem to give verry similar output. For the housing data, I have changed the data into numeric vectors instead of factors/ordered factors. I did so to try and get the same results as in STATA and to have the housing example as close as possible to the one I constructed. I run a debian sid, kernel 2.4, R 2.0.0, and STATA version 8.2, MASS version 7.2-8. here's the example ( I assume that you have STATA installed and can run in batch mode, if not the output is also given below) library(MASS) library(foreign) set.seed(100) X <- rnorm(1000) X1 <- rnorm(1000) X2 <- rnorm(1000) X <- X +X1+X2 XX <- X<=quantile(X, .25) XX[X>quantile(X, .25) & X<=quantile(X, .50)] <- 2 XX[X>quantile(X, .5) & X<=quantile(X, .75)] <- 3 XX[X>quantile(X, .75)] <- 4 temp <- data.frame(XX=XX, X1=X1, X2=X2, X=X) summary(polr(factor(XX)~X1 +X2, data=temp, method="probit")) write.dta(temp, "temp.dta") #################################### #Stata stuff #################################### cat("use temp.dta\n oprobit XX X1 X2\n", file="temp.ado") system("stata -b do temp.ado&") system("cat temp.log") # ##### here's R's output ############################# Re-fitting to get Hessian Call: polr(formula = factor(XX) ~ X1 + X2, data = temp, method = "probit") Coefficients: Value Std. Error t value X1 0.9891735 0.04749225 20.82811 X2 0.9400804 0.04527653 20.76309 Intercepts: Value Std. Error t value 1|2 -1.1411 0.0572 -19.9613 2|3 -0.0372 0.0486 -0.7656 3|4 1.1101 0.0579 19.1865 Residual Deviance: 1969.734 AIC: 1979.734 ############################################## #and here Stata's output ############################################## Ordered probit estimates Number of obs 1000 LR chi2(2) 802.86 Prob > chi2 0.0000 Log likelihood = -984.86675 Pseudo R2 0.2896 ------------------------------------------------------------------------------ XX | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- X1 | .9891651 .0474922 20.83 0.000 .8960822 1.082248 X2 | .94007 .0452765 20.76 0.000 .8513298 1.02881 -------------+---------------------------------------------------------------- _cut1 | -1.141119 .0571667 (Ancillary parameters) _cut2 | -.0371779 .0485592 _cut3 | 1.110117 .0578593
On Wed, 10 Nov 2004, Jean Eid wrote:> Dear All, > I have been struggling to understand why for the housing data in MASS > library R and stata give coef. estimates that are really different. I also > tried to come up with many many examples myself (see below, of course I > did not have the set.seed command included) and all of my > `random' examples seem to give verry similar output. For the housing data, > I have changed the data into numeric vectors instead of factors/ordered > factors. I did so to try and get the same results as in STATA and to have > the housing example as close as possible to the one I constructed. > > I run a debian sid, kernel 2.4, R 2.0.0, and STATA version 8.2, MASS > version 7.2-8. > > > here's the example ( I assume that you have STATA installed and can run in > batch mode, if not the output is also given below) >That example shows the same results with Stata and polr() from MASS. For the housing data, I also get the same coefficients in Stata as with polr(): In R: library(MASS) library(foreign) write.dta(housing, file="housing.dta") house.probit<-polr(Sat ~ Infl + Type + Cont, data = housing, weights = Freq, method = "probit") summary(house.probit) ------------------------- Re-fitting to get Hessian Call: polr(formula = Sat ~ Infl + Type + Cont, data = housing, weights = Freq, method = "probit") Coefficients: Value Std. Error t value InflMedium 0.3464233 0.06413706 5.401297 InflHigh 0.7829149 0.07642620 10.244063 TypeApartment -0.3475372 0.07229093 -4.807480 TypeAtrium -0.2178874 0.09476607 -2.299213 TypeTerrace -0.6641737 0.09180004 -7.235005 ContHigh 0.2223862 0.05812267 3.826153 Intercepts: Value Std. Error t value Low|Medium -0.2998 0.0762 -3.9371 Medium|High 0.4267 0.0764 5.5850 Residual Deviance: 3479.689 AIC: 3495.689 ------------------------ In Stata ----------------- . use housing.dta . xi: oprobit Sat i.Infl i.Type i.Cont [fw=Freq] i.Infl _IInfl_1-3 (naturally coded; _IInfl_1 omitted) i.Type _IType_1-4 (naturally coded; _IType_1 omitted) i.Cont _ICont_1-2 (naturally coded; _ICont_1 omitted) Iteration 0: log likelihood = -1824.4388 Iteration 1: log likelihood = -1739.9254 Iteration 2: log likelihood = -1739.8444 Ordered probit estimates Number of obs = 1681 LR chi2(6) = 169.19 Prob > chi2 = 0.0000 Log likelihood = -1739.8444 Pseudo R2 = 0.0464 ------------------------------------------------------------------------------ Sat | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _IInfl_2 | .3464228 .064137 5.40 0.000 .2207165 .472129 _IInfl_3 | .7829146 .076426 10.24 0.000 .6331224 .9327069 _IType_2 | -.3475367 .0722908 -4.81 0.000 -.4892241 -.2058493 _IType_3 | -.2178875 .094766 -2.30 0.021 -.4036254 -.0321497 _IType_4 | -.6641735 .0917999 -7.24 0.000 -.844098 -.484249 _ICont_2 | .2223858 .0581226 3.83 0.000 .1084676 .336304 -------------+---------------------------------------------------------------- _cut1 | -.2998279 .0761537 (Ancillary parameters) _cut2 | .4267208 .0764043 ------------------------------------------------------------------------------ -thomas