Yuan Yuan
2011-Nov-25 03:37 UTC
[R] Unable to reproduce Stata Heckman sample selection estimates
Hello, I am working on reproducing someone's analysis which was done in Stata. The analysis is estimation of a standard Heckman sample selection model (Tobit-2), for which I am using the sampleSelection package and the selection() function. I have a few problems with the estimation: 1) The reported standard error for all estimates is Inf ... vcov(selectionObject) yields Inf in every cell. 2) While the selection equation coefficient estimates are almost exactly the same as the Stata results, the outcome equation coefficient estimates are quite different (different sign in one case, order of magnitude difference in some other cases). 3) I can't seem to figure out how to specify the initial values for the MLE ... whatever argument I pass to start (even of the form coef(selectionObject)), I get the following error: Error in gr[, fixed] <- NA : (subscript) logical subscript too long I have to admit I am pretty confused by #1, I feel like I must be doing something wrong, missing something obvious, but I have no idea what. I figure #2 might be because the algorithms (selection and Stata) are just finding different local maxima, but because of #3 I can't test that guess by using different initial values in selection. Let me know if I should provide any more information. Thanks in advance for any pointers in the right direction. - Clara
Arne Henningsen
2011-Nov-25 09:04 UTC
[R] Unable to reproduce Stata Heckman sample selection estimates
On 25 November 2011 04:37, Yuan Yuan <y.yuan at vt.edu> wrote:> Hello, > > I am working on reproducing someone's analysis which was done in > Stata. The analysis is estimation of a standard Heckman sample > selection model (Tobit-2), for which I am using the sampleSelection > package and the selection() function. I have a few problems with the > estimation: > > 1) The reported standard error for all estimates is Inf ... > vcov(selectionObject) yields Inf in every cell. > > 2) While the selection equation coefficient estimates are almost > exactly the same as the Stata results, the outcome equation > coefficient estimates are quite different (different sign in one case, > order of magnitude difference in some other cases). > > 3) I can't seem to figure out how to specify the initial values for > the MLE ... whatever argument I pass to start (even of the form > coef(selectionObject)), I get the following error: > Error in gr[, fixed] <- NA : (subscript) logical subscript too long > > I have to admit I am pretty confused by #1, I feel like I must be > doing something wrong, missing something obvious, but I have no idea > what. I figure #2 might be because the algorithms (selection and > Stata) are just finding different local maxima, but because of #3 I > can't test that guess by using different initial values in selection. > > Let me know if I should provide any more information. Thanks in > advance for any pointers in the right direction.Yes, please provide more information (see also the posting guide [1]), e.g. which version of R and which version of the sampleSelection package are you using? Do you estimate the model by the two-step approach or by the 1-step maximum likelihood method? Which commands did use use? Can you send us a reproducible example? Have you read the paper about using the sampleSelection package [2]? [1] http://www.r-project.org/posting-guide.html [2] http://www.jstatsoft.org/v27/i07 Best wishes from copenhagen, Arne -- Arne Henningsen http://www.arne-henningsen.name