Hi everyone, I am looking to do some manual maximum likelihood estimation in R. I have done a lot of work in Stata and so I have been using output comparisons to get a handle on what is happening. I estimated a simple linear model in R with lm() and also my own maximum likelihood program. I then compared the output with Stata. Two things jumped out at me. Firstly, in Stata my coefficient estimates are identical in both the OLS estimation by -reg- and the maximum likelihood estimation using Stata's ml lf command. In R my coefficient estimates differ slightly between lm() and my own maximum likelihood estimation. Secondly, the estimates for sigma2 are very different between R and Stata. 3.14 in R compared to 1.78 in Stata. I have copied my maximum likelihood program below. It is copied from a great intro to MLE in R by Macro Steenbergen http://artsci.wustl.edu/~jmonogan/computing/r/MLE_in_R.pdf Any comments are welcome. In particular I would like to know why the estimate of sigma2 is so different. I would also like to know about the accuracy of the coefficient estimates. ## ols ols <- lm(Kmenta$consump ~ Kmenta$price + Kmenta$income) coef(summary(ols)) ## mle y <- matrix(Kmenta$consump) x <- cbind(1, Kmenta$price, Kmenta$income) ols.lf <- function(theta, y, x) { N <- nrow(y) K <- ncol(x) beta <- theta[1:K] sigma2 <- theta[K+1] e <- y - x%*%beta logl <- -0.5*N*log(2*pi)-0.5*N*log(sigma2)-((t(e)%*%e)/(2*sigma2)) return(-logl) } p <- optim(c(0,0,0,2), ols.lf, method="BFGS", hessian=T, y=y, x=x) OI <- solve(p$hessian) se <- sqrt(diag(OI)) se <- se[1:3] beta <- p$par[1:3] results <- cbind(beta, se) results sigma2 <- p$par[4] sigma2 Kind regards, Alex Olssen Motu Research Wellington New Zealand
Peter Ehlers
2011-Mar-28 15:08 UTC
[R] maximum likelihood accuracy - comparison with Stata
On 2011-03-27 21:37, Alex Olssen wrote:> Hi everyone, > > I am looking to do some manual maximum likelihood estimation in R. I > have done a lot of work in Stata and so I have been using output > comparisons to get a handle on what is happening. > > I estimated a simple linear model in R with lm() and also my own > maximum likelihood program. I then compared the output with Stata. > Two things jumped out at me. > > Firstly, in Stata my coefficient estimates are identical in both the > OLS estimation by -reg- and the maximum likelihood estimation using > Stata's ml lf command. > In R my coefficient estimates differ slightly between lm() and my > own maximum likelihood estimation. > > Secondly, the estimates for sigma2 are very different between R > and Stata. 3.14 in R compared to 1.78 in Stata. > > I have copied my maximum likelihood program below. It is copied from > a great intro to MLE in R by Macro Steenbergen > http://artsci.wustl.edu/~jmonogan/computing/r/MLE_in_R.pdf > > Any comments are welcome. In particular I would like to know why the > estimate of sigma2 is so different. I would also like to know > about the accuracy of the coefficient estimates.Some comments: 1. Since Kmenta is not in the datasets package, you should mention where you're getting it. (I suppose that for economists, it's obvious, but we're not all economists.) I used the version in the systemfit package. 2. I don't believe your Stata value for sigma2 (by which I assume you mean sigma^2). Are you quoting sigma? 3. I can't reproduce your R value of 3.14 for sigma2. I get 3.725391. 4. (most important) There's a difference between the simple ML estimate (which is biased) and R's unbiased estimate for sigma^2. This dataset has 20 cases, so try multiplying your result by 20/17. 5. As to any difference in coefficients, I would guess that the difference is slight (you don't show what it is); it may be due to the fact that optim() produces approximate solutions, whereas in this simple case, an 'exact' solution is possible (and found by R). 6. In case you weren't aware of it: the stats4 package has an mle() function. Peter Ehlers> > ## ols > ols<- lm(Kmenta$consump ~ Kmenta$price + Kmenta$income) > coef(summary(ols)) > > ## mle > y<- matrix(Kmenta$consump) > x<- cbind(1, Kmenta$price, Kmenta$income) > ols.lf<- function(theta, y, x) { > N<- nrow(y) > K<- ncol(x) > beta<- theta[1:K] > sigma2<- theta[K+1] > e<- y - x%*%beta > logl<- -0.5*N*log(2*pi)-0.5*N*log(sigma2)-((t(e)%*%e)/(2*sigma2)) > return(-logl) > } > p<- optim(c(0,0,0,2), ols.lf, method="BFGS", hessian=T, y=y, x=x) > OI<- solve(p$hessian) > se<- sqrt(diag(OI)) > se<- se[1:3] > beta<- p$par[1:3] > results<- cbind(beta, se) > results > sigma2<- p$par[4] > sigma2 > > Kind regards, > > Alex Olssen > Motu Research > Wellington > New Zealand > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.