Hélène Huber-Yahi
2013-Nov-07 12:48 UTC
[R] AER ivreg diagnostics: question on DF of Sargan test
Hello, I'm new to R and I'm currently learning to use package AER, which is extremely comprehensive and useful. I have one question related to the diagnostics after ivreg: if I understood well, the Sargan test provided states that the statistic should follow a Chi squared of degrees of freedom equal to the number of excluded instruments minus one. But I read many times that the degrees of freedom of this statistic is supposed to equal the number of overidentifying restrictions, i.e. the number of excluded instruments minus the number of endogenous variables tested. When comparing with Stata results (estat overid after ivreg, same with ivreg2 output), the statistic is the same as the one provided by R, only the p-value changes because the distribution chosen is different. Is this command using a different flavor of the Sargan test ? I did not find the details in the AER pdf. I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The output I get from R is the following, where the Sargan DF is equal to 5, while I thought it would be equal to 6-3=3. The data comes from Verbeek's econometrics textbook and the example replicates the one in the book. Dependent variable is log of wage, endogenous variables are education, experience and its square (3 of them), excluded instruments are parents' education etc (6 of them).> ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76,+ data = school)> > summary(ivmodel,diagnostics=TRUE)Call: ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76, data = school) Residuals: Min 1Q Median 3Q Max -1.63375 -0.22253 0.02403 0.24350 1.32911 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.6064811 0.1126195 40.903 < 2e-16 *** ed76 0.0848507 0.0066061 12.844 < 2e-16 *** exp76 0.0796432 0.0164406 4.844 1.34e-06 *** exp762 -0.0020376 0.0008257 -2.468 0.0136 * black -0.1726723 0.0195231 -8.845 < 2e-16 *** smsa76 0.1521693 0.0165207 9.211 < 2e-16 *** south76 -0.1204765 0.0154904 -7.778 1.01e-14 *** Diagnostic tests: df1 df2 statistic p-value Weak instruments 6 2987 965.450 <2e-16 *** Wu-Hausman 2 2988 1.949 0.143 Sargan 5 NA 3.868 0.569 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.3753 on 2990 degrees of freedom Multiple R-Squared: 0.2868, Adjusted R-squared: 0.2854 Wald test: 178.6 on 6 and 2990 DF, p-value: < 2.2e-16 Would this be caused by the fact that I'm using 2SLS and not GMM (at least I suppose) to estimate the IV model ? I apologize if this comes from a misunderstanding from my part, and I thank you in advance for your help. Best, H. Huber [[alternative HTML version deleted]]
Achim Zeileis
2013-Nov-07 18:07 UTC
[R] AER ivreg diagnostics: question on DF of Sargan test
H?l?ne, thanks for spotting this! This is a bug in "AER". I had just tested the new diagnostics for regressions with 1 endogenous variable and hence never noticed the problem. But if there are > 1 endogenous variables, the df used in ivreg() (and hence the associated p-values) are too large. I've fixed the problem in AER's devel-version and will release it on CRAN in the next days. Thanks & best regards, Z On Thu, 7 Nov 2013, H?l?ne Huber-Yahi wrote:> Hello, > I'm new to R and I'm currently learning to use package AER, which is > extremely comprehensive and useful. I have one question related to the > diagnostics after ivreg: if I understood well, the Sargan test provided > states that the statistic should follow a Chi squared of degrees of freedom > equal to the number of excluded instruments minus one. But I read many > times that the degrees of freedom of this statistic is supposed to equal > the number of overidentifying restrictions, i.e. the number of excluded > instruments minus the number of endogenous variables tested. When comparing > with Stata results (estat overid after ivreg, same with ivreg2 output), the > statistic is the same as the one provided by R, only the p-value changes > because the distribution chosen is different. Is this command using a > different flavor of the Sargan test ? I did not find the details in the AER > pdf. > I'm using Rstudio with R 3.0.2 (Windows 7) and AER is up to date. The > output I get from R is the following, where the Sargan DF is equal to 5, > while I thought it would be equal to 6-3=3. The data comes from Verbeek's > econometrics textbook and the example replicates the one in the book. > Dependent variable is log of wage, endogenous variables are education, > experience and its square (3 of them), excluded instruments are parents' > education etc (6 of them). > >> ivmodel <- ivreg(lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + black + smsa76 + south76,+ data = school)> > summary(ivmodel,diagnostics=TRUE) > Call: > ivreg(formula = lwage76 ~ ed76 + exp76 + exp762 + black + smsa76 + > south76 | daded + momed + libcrd14 + age76 + age762 + nearc4 + > black + smsa76 + south76, data = school) > > Residuals: > Min 1Q Median 3Q Max > -1.63375 -0.22253 0.02403 0.24350 1.32911 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 4.6064811 0.1126195 40.903 < 2e-16 *** > ed76 0.0848507 0.0066061 12.844 < 2e-16 *** > exp76 0.0796432 0.0164406 4.844 1.34e-06 *** > exp762 -0.0020376 0.0008257 -2.468 0.0136 * > black -0.1726723 0.0195231 -8.845 < 2e-16 *** > smsa76 0.1521693 0.0165207 9.211 < 2e-16 *** > south76 -0.1204765 0.0154904 -7.778 1.01e-14 *** > > Diagnostic tests: > df1 df2 statistic p-value > Weak instruments 6 2987 965.450 <2e-16 *** > Wu-Hausman 2 2988 1.949 0.143 > Sargan 5 NA 3.868 0.569 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0.3753 on 2990 degrees of freedom > Multiple R-Squared: 0.2868, Adjusted R-squared: 0.2854 > Wald test: 178.6 on 6 and 2990 DF, p-value: < 2.2e-16 > > > Would this be caused by the fact that I'm using 2SLS and not GMM (at least > I suppose) to estimate the IV model ? I apologize if this comes from a > misunderstanding from my part, and I thank you in advance for your help. > > Best, > > H. Huber > > [[alternative HTML version deleted]] > >