Liviu Andronic

2010-Feb-04 11:32 UTC

### [R] plm issues: error for "within" or "random", but not for "pooling"

Dear all I am working on unbalanced panel data and I can readily fit a "pooling" model using plm(), but not a "within" or "random" model. Reproducing the examples in vignette("plm") and in the AER package I encountered no such issues. ##unfortunately I cannot disclose the data, and it is too big anyway> dim(ibes.kld.exp.p[x.subs , ])[1] 13189 34> summary(ibes.kld.exp.p[x.subs , ]$ibes1y.meanest)total sum of squares : 28058 id time 0.752284 0.018656> summary(ibes.kld.exp.p[x.subs , ]$employee_kld)total sum of squares : 9146.5 id time 0.637098 0.073421 ##fitting a pooling model works OK> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], model="pooling") > summary(x)Oneway (individual) effect Pooling Model Call: plm(formula = ibes1y.meanest ~ employee_kld, data = ibes.kld.exp.p[x.subs, ], model = "pooling") Unbalanced Panel: n=3041, T=1-16, N=13189 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -6.530 -0.871 -0.189 0.629 13.200 Coefficients : Estimate Std. Error t-value Pr(>|t|) (Intercept) 1.5607 0.0127 122.73 < 2e-16 *** employee_kld 0.1118 0.0152 7.35 2.2e-13 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Total Sum of Squares: 28100 Residual Sum of Squares: 27900 F-statistic: 53.954 on 1 and 13187 DF, p-value: 2.17e-13> plmtest(x, "individual")Lagrange Multiplier Test - (Honda) data: ibes1y.meanest ~ employee_kld normal = 1675.7, p-value < 2.2e-16 alternative hypothesis: significant effects ##fitting a within or random model fails> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], model="within")Error in Tapply.matrix(x, effect, mean, ...) : subscript out of bounds> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], model="random")Error in Tapply.matrix(x, effect, mean, ...) : subscript out of bounds Would this be an issue with my data (which is a bit specific, since employee_kld is categorical)? Or perhaps there is an issue in plm() for unbalanced data? Please let me know your opinion Liviu> sessionInfo()R version 2.10.1 (2009-12-14) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tcltk grid splines stats graphics grDevices utils [8] datasets methods base other attached packages: [1] RcmdrPlugin.sos_0.2-0 tcltk2_1.1-1 RcmdrPlugin.Export_0.3-0 [4] Hmisc_3.7-0 xtable_1.5-6 Rcmdr_1.5-5 [7] car_1.2-16 ggplot2_0.8.5 digest_0.4.2 [10] reshape_0.8.3 plyr_0.1.9 proto_0.3-8 [13] plm_1.2-3 sandwich_2.2-5 zoo_1.6-2 [16] MASS_7.3-5 Formula_0.2-0 kinship_1.1.0-23 [19] lattice_0.18-3 nlme_3.1-96 survival_2.35-8 [22] fortunes_1.3-7 sos_1.2-4 brew_1.0-3 [25] hints_1.0.1-1 loaded via a namespace (and not attached): [1] cluster_1.12.1 tools_2.10.1 -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

Millo Giovanni

2010-Feb-04 14:10 UTC

### [R] R: plm issues: error for "within" or "random", but not for "pooling"

Dear Liviu, it's difficult to tell without seeing the data. I might guess that you have some completely empty groups about which Tapply complains when doing the time-demeaning, but it would be just a guess. I realize you can't share the data in the present form, but may I suggest you try and subset your data in some random way, find a "problematic" subset (one which gives the error) then change labels and everything so that the data become unrecognizable, and send us that example? You can also randomly transform them, as this is likely to be a missing values issue. Giovanni -----Messaggio originale----- Da: Liviu Andronic [mailto:landronimirc at gmail.com] Inviato: gioved? 4 febbraio 2010 12:32 A: r-help at r-project.org Help Cc: yves.croissant at let.ish-lyon.cnrs.fr; Millo Giovanni Oggetto: plm issues: error for "within" or "random", but not for "pooling" Dear all I am working on unbalanced panel data and I can readily fit a "pooling" model using plm(), but not a "within" or "random" model. Reproducing the examples in vignette("plm") and in the AER package I encountered no such issues. ##unfortunately I cannot disclose the data, and it is too big anyway> dim(ibes.kld.exp.p[x.subs , ])[1] 13189 34> summary(ibes.kld.exp.p[x.subs , ]$ibes1y.meanest)total sum of squares : 28058 id time 0.752284 0.018656> summary(ibes.kld.exp.p[x.subs , ]$employee_kld)total sum of squares : 9146.5 id time 0.637098 0.073421 ##fitting a pooling model works OK> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], > model="pooling") > summary(x)Oneway (individual) effect Pooling Model Call: plm(formula = ibes1y.meanest ~ employee_kld, data = ibes.kld.exp.p[x.subs, ], model = "pooling") Unbalanced Panel: n=3041, T=1-16, N=13189 Residuals : Min. 1st Qu. Median 3rd Qu. Max. -6.530 -0.871 -0.189 0.629 13.200 Coefficients : Estimate Std. Error t-value Pr(>|t|) (Intercept) 1.5607 0.0127 122.73 < 2e-16 *** employee_kld 0.1118 0.0152 7.35 2.2e-13 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Total Sum of Squares: 28100 Residual Sum of Squares: 27900 F-statistic: 53.954 on 1 and 13187 DF, p-value: 2.17e-13> plmtest(x, "individual")Lagrange Multiplier Test - (Honda) data: ibes1y.meanest ~ employee_kld normal = 1675.7, p-value < 2.2e-16 alternative hypothesis: significant effects ##fitting a within or random model fails> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], > model="within")Error in Tapply.matrix(x, effect, mean, ...) : subscript out of bounds> x <- plm(ibes1y.meanest ~ employee_kld, ibes.kld.exp.p[x.subs , ], > model="random")Error in Tapply.matrix(x, effect, mean, ...) : subscript out of bounds Would this be an issue with my data (which is a bit specific, since employee_kld is categorical)? Or perhaps there is an issue in plm() for unbalanced data? Please let me know your opinion Liviu> sessionInfo()R version 2.10.1 (2009-12-14) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tcltk grid splines stats graphics grDevices utils [8] datasets methods base other attached packages: [1] RcmdrPlugin.sos_0.2-0 tcltk2_1.1-1 RcmdrPlugin.Export_0.3-0 [4] Hmisc_3.7-0 xtable_1.5-6 Rcmdr_1.5-5 [7] car_1.2-16 ggplot2_0.8.5 digest_0.4.2 [10] reshape_0.8.3 plyr_0.1.9 proto_0.3-8 [13] plm_1.2-3 sandwich_2.2-5 zoo_1.6-2 [16] MASS_7.3-5 Formula_0.2-0 kinship_1.1.0-23 [19] lattice_0.18-3 nlme_3.1-96 survival_2.35-8 [22] fortunes_1.3-7 sos_1.2-4 brew_1.0-3 [25] hints_1.0.1-1 loaded via a namespace (and not attached): [1] cluster_1.12.1 tools_2.10.1 -- Do you know how to read? http://www.alienetworks.com/srtest.cfm http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader Do you know how to write? http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail Giovanni Millo Research Dept., Assicurazioni Generali SpA Via Machiavelli 4, 34132 Trieste (Italy) tel. +39 040 671184 fax +39 040 671160