Liviu Andronic
2010-Mar-16 22:24 UTC
[R] plm "within" models: is the correct F-statistic reported?
Dear R users I get different F-statistic results for a "within" model, when using "time" or "twoways" effects in plm() [1] and when manually specifying the time control dummies [2]. [1] vignette("plm") [2] http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdf Two examples below: library("AER") data("Grunfeld", package = "AER") library("plm") gr <- subset(Grunfeld, firm %in% c("General Electric", "General Motors", "IBM")) pgr <- plm.data(gr, index = c("firm", "year"))> dim(pgr)[1] 60 5 ## the first example is actually on "individual" effects> gr_fe <- plm(invest ~ value + capital, data = pgr,+ model = "within", effect="individual")> summary(gr_fe)Oneway (individual) effect Within Model Call: plm(formula = invest ~ value + capital, data = pgr, effect = "individual", model = "within") Balanced Panel: n=3, T=20, N=60 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -167.00 -26.10 2.09 26.80 202.00 Coefficients : Estimate Std. Error t-value Pr(>|t|) value 0.1049 0.0163 6.42 0.000000033 *** capital 0.3453 0.0244 14.16 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Total Sum of Squares: 1890000 Residual Sum of Squares: 244000 F-statistic: 185.407 on 2 and 55 DF, p-value: <2e-16> summary(fixef(gr_fe))Estimate Std. Error t-value Pr(>|t|) General Motors -70.6 67.8 -1.04 0.30 General Electric -239.6 32.8 -7.29 3e-13 *** IBM -24.6 16.2 -1.52 0.13 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> gr_lm <- lm(invest ~ 0+value + capital + firm, data = pgr) > summary(gr_lm)Call: lm(formula = invest ~ 0 + value + capital + firm, data = pgr) Residuals: Min 1Q Median 3Q Max -167.33 -26.14 2.09 26.84 201.68 Coefficients: Estimate Std. Error t value Pr(>|t|) value 0.1049 0.0163 6.42 0.0000000330 *** capital 0.3453 0.0244 14.16 < 2e-16 *** firmGeneral Motors -70.5629 67.8196 -1.04 0.30 firmGeneral Electric -239.5560 32.8400 -7.29 0.0000000012 *** firmIBM -24.6480 16.1726 -1.52 0.13 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 66.6 on 55 degrees of freedom Multiple R-squared: 0.974, Adjusted R-squared: 0.972 F-statistic: 420 on 5 and 55 DF, p-value: <2e-16 In the first case, plm(..., effect="individual"), F-statistic: 185.407 and in the second F-statistic: 420, while all other regression coefficients and standard errors are the same. Which F-statistic should be considered? ## A second example with "twoways" effects> gr_fe1 <- plm(invest ~ value + capital, data = pgr,+ model = "within", effect="twoways")> summary(gr_fe1)Twoways effects Within Model Call: plm(formula = invest ~ value + capital, data = pgr, effect = "twoways", model = "within") Balanced Panel: n=3, T=20, N=60 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -153.00 -29.10 2.23 34.80 125.00 Coefficients : Estimate Std. Error t-value Pr(>|t|) value 0.1295 0.0224 5.77 1.4e-06 *** capital 0.4184 0.0353 11.85 5.5e-14 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Total Sum of Squares: 957000 Residual Sum of Squares: 138000 F-statistic: 107.246 on 2 and 36 DF, p-value: 6.84e-16> #summary(fixef(gr_fe1, "time")) > gr_fe2 <- plm(invest ~ value + capital + year, data = pgr,+ model = "within", effect="individual")> summary(gr_fe2)Oneway (individual) effect Within Model Call: plm(formula = invest ~ value + capital + year, data = pgr, effect "individual", model = "within") Balanced Panel: n=3, T=20, N=60 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -153.00 -29.10 2.23 34.80 125.00 Coefficients : Estimate Std. Error t-value Pr(>|t|) value 0.1295 0.0224 5.77 1.4e-06 *** capital 0.4184 0.0353 11.85 5.5e-14 *** year[T.1936] -83.9625 53.6143 -1.57 0.1261 year[T.1937] -150.9206 58.3282 -2.59 0.0139 * year[T.1938] -81.2343 50.7175 -1.60 0.1180 year[T.1939] -137.4579 53.4385 -2.57 0.0144 * year[T.1940] -96.3584 53.9837 -1.78 0.0827 . year[T.1941] -56.5587 53.0089 -1.07 0.2931 year[T.1942] -36.6539 50.9966 -0.72 0.4769 year[T.1943] -78.0794 52.0249 -1.50 0.1421 year[T.1944] -66.4725 52.5047 -1.27 0.2136 year[T.1945] -89.5562 54.2876 -1.65 0.1077 year[T.1946] -59.1147 55.3115 -1.07 0.2923 year[T.1947] -87.5444 52.6530 -1.66 0.1051 year[T.1948] -119.9125 53.3167 -2.25 0.0307 * year[T.1949] -167.9552 54.1999 -3.10 0.0038 ** year[T.1950] -172.7676 55.0212 -3.14 0.0034 ** year[T.1951] -191.1369 57.6114 -3.32 0.0021 ** year[T.1952] -195.4503 59.6377 -3.28 0.0023 ** year[T.1953] -174.6639 66.3451 -2.63 0.0124 * year[T.1954] -181.1273 68.5794 -2.64 0.0121 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Total Sum of Squares: 1890000 Residual Sum of Squares: 138000 F-statistic: 21.8327 on 21 and 36 DF, p-value: 1.32e-14 In the first case, F-statistic: 107.246 , while in the second F-statistic: 21.8327. Again, which one is correct? Thank you Liviu -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail
Achim Zeileis
2010-Mar-16 23:39 UTC
[R] plm "within" models: is the correct F-statistic reported?
> Dear R users > I get different F-statistic results for a "within" model, when using > "time" or "twoways" effects in plm() [1] and when manually specifying > the time control dummies [2]. > [1] vignette("plm") > [2] http://cran.r-project.org/doc/contrib/Farnsworth-EconometricsInR.pdfWell, the question is incomplete in a way. An F-statistic is always associated with testing a model against some restricted version of that model. And which restricted model is reasonable might vary with your application. You used: data("Grunfeld", package = "AER") library("plm") gr <- subset(Grunfeld, firm %in% c("General Electric", "General Motors", "IBM")) pgr <- plm.data(gr, index = c("firm", "year")) and then considered gr_fe <- plm(invest ~ value + capital, data = pgr, model = "within", effect = "individual") which you correctly pointed out is equivalent to gr_lm <- lm(invest ~ 0 + value + capital + firm, data = pgr) The difference between the two is that in "gr_fe" the model knows that the parameters of interest are "value" and "capital" and that the firm-specific intercepts are nuisance parameters (or at least of less importance than value/capital). In "gr_lm" however, the fitted model does not know about that. It just knows that you forced out the intercept (and doesn't check that a firm-specific intercept is in fact included). Hence, when saying summary() different models with "no effects" are assumed. For gr_fe the model without effects just omits value/capital but keeps the firm-specific interecepts. For gr_lm not even the intercept is kept in the model. Thus: gr_fe_null <- lm(invest ~ 0 + firm, data = pgr) gr_lm_null <- lm(invest ~ 0, data = pgr) Then, comparing the full model (gr_lm) against the different null models yields: R> anova(gr_fe_null, gr_lm) Analysis of Variance Table Model 1: invest ~ 0 + firm Model 2: invest ~ 0 + value + capital + firm Res.Df RSS Df Sum of Sq F Pr(>F) 1 57 1888946 2 55 243985 2 1644961 185.41 < 2.2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R> anova(gr_lm_null, gr_lm) Analysis of Variance Table Model 1: invest ~ 0 Model 2: invest ~ 0 + value + capital + firm Res.Df RSS Df Sum of Sq F Pr(>F) 1 60 9553385 2 55 243985 5 9309400 419.71 < 2.2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> In the first case, plm(..., effect="individual"), F-statistic: 185.407 > and in the second F-statistic: 420, while all other regression > coefficients and standard errors are the same. Which F-statistic > should be considered?It depends what you want to test. But I doubt that the one reported in summary(gr_lm) tests a useful hypothesis/alternative. Best, Z
Liviu Andronic
2010-Mar-17 13:38 UTC
[R] plm "within" models: is the correct F-statistic reported?
Dear Achim On 3/16/10, Achim Zeileis <Achim.Zeileis at uibk.ac.at> wrote:> Hence, when saying summary() different models with "no effects" are > assumed. For gr_fe the model without effects just omits value/capital but > keeps the firm-specific interecepts. For gr_lm not even the intercept is > kept in the model. Thus: > > gr_fe_null <- lm(invest ~ 0 + firm, data = pgr) > gr_lm_null <- lm(invest ~ 0, data = pgr) >What would be the more useful "no effects" model in the plm(..., effect="twoways") case? Considering the same setting, library("AER") data("Grunfeld", package = "AER") library("plm") gr <- subset(Grunfeld, firm %in% c("General Electric", "General Motors", "IBM")) pgr <- plm.data(gr, index = c("firm", "year")) I am fitting a "twoways" model and an "individual" with manually specified time effects.> gr_fe1 <- plm(invest ~ value + capital, data = pgr,+ model = "within", effect="twoways")> summary(gr_fe1)Twoways effects Within Model Call: plm(formula = invest ~ value + capital, data = pgr, effect = "twoways", model = "within") Balanced Panel: n=3, T=20, N=60 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -153.00 -29.10 2.23 34.80 125.00 Coefficients : Estimate Std. Error t-value Pr(>|t|) value 0.1295 0.0224 5.77 1.4e-06 *** capital 0.4184 0.0353 11.85 5.5e-14 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Total Sum of Squares: 957000 Residual Sum of Squares: 138000 F-statistic: 107.246 on 2 and 36 DF, p-value: 6.84e-16> gr_fe2 <- plm(invest ~ value + capital + year, data = pgr,+ model = "within", effect="individual")> summary(gr_fe2)Oneway (individual) effect Within Model Call: plm(formula = invest ~ value + capital + year, data = pgr, effect "individual", model = "within") Balanced Panel: n=3, T=20, N=60 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -153.00 -29.10 2.23 34.80 125.00 Coefficients : Estimate Std. Error t-value Pr(>|t|) value 0.1295 0.0224 5.77 1.4e-06 *** capital 0.4184 0.0353 11.85 5.5e-14 *** year1936 -83.9625 53.6143 -1.57 0.1261 year1937 -150.9206 58.3282 -2.59 0.0139 * year1938 -81.2343 50.7175 -1.60 0.1180 year1939 -137.4579 53.4385 -2.57 0.0144 * year1940 -96.3584 53.9837 -1.78 0.0827 . year1941 -56.5587 53.0089 -1.07 0.2931 year1942 -36.6539 50.9966 -0.72 0.4769 year1943 -78.0794 52.0249 -1.50 0.1421 year1944 -66.4725 52.5047 -1.27 0.2136 year1945 -89.5562 54.2876 -1.65 0.1077 year1946 -59.1147 55.3115 -1.07 0.2923 year1947 -87.5444 52.6530 -1.66 0.1051 year1948 -119.9125 53.3167 -2.25 0.0307 * year1949 -167.9552 54.1999 -3.10 0.0038 ** year1950 -172.7676 55.0212 -3.14 0.0034 ** year1951 -191.1369 57.6114 -3.32 0.0021 ** year1952 -195.4503 59.6377 -3.28 0.0023 ** year1953 -174.6639 66.3451 -2.63 0.0124 * year1954 -181.1273 68.5794 -2.64 0.0121 * --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Total Sum of Squares: 1890000 Residual Sum of Squares: 138000 F-statistic: 21.8327 on 21 and 36 DF, p-value: 1.32e-14 Following the reasoning in your previous e-mail, I assume that the (more useful) "no effects" model used in the "twoways" case is> gr_fe1_null <- lm(invest ~ 0 + firm + year, data = pgr)However I cannot replicate the F-statistic: 107.246.> anova(gr_fe1_null, gr_fe1)Analysis of Variance Table Response: invest Df Sum Sq Mean Sq F value Pr(>F) firm 3 7664439 2554813 101.46 <2e-16 *** year 19 932060 49056 1.95 0.040 * Residuals 38 956886 25181 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Warning message: In anova.lmlist(object, ...) : models with response "NULL" removed because response differs from model 1> anova(gr_fe1_null, gr_fe2)Analysis of Variance Table Response: invest Df Sum Sq Mean Sq F value Pr(>F) firm 3 7664439 2554813 101.46 <2e-16 *** year 19 932060 49056 1.95 0.040 * Residuals 38 956886 25181 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Warning message: In anova.lmlist(object, ...) : models with response "NULL" removed because response differs from model 1 In the case of "individual" with manually introduced time effects, I assume the following null is used:> gr_fe2_null <- lm(invest ~ 0 + firm, data = pgr)But even here I cannot replicate the F-statistic: 21.8327.> anova(gr_fe2_null, gr_fe2)Analysis of Variance Table Response: invest Df Sum Sq Mean Sq F value Pr(>F) firm 3 7664439 2554813 77.1 <2e-16 *** Residuals 57 1888946 33139 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Warning message: In anova.lmlist(object, ...) : models with response "NULL" removed because response differs from model 1 Am I doing something wrong? Liviu