Philipp Grueber
2013-Jan-11 02:20 UTC
[R] Manual two-way demeaning of unbalanced panel data (Wansbeek/Kapteyn transformation)
Dear R users, I wish to manually demean a panel over time and entities. I tried to code the Wansbeek and Kapteyn (1989) transformation (from Baltagi's book Ch. 9). As a benchmark I use both the pmodel.response() and model.matrix() functions in package plm and the results from using dummy variables. As far as I understood the transformation (Ch.3), Q%*%y (with y being the dependent variable) should yield the demeaned series. However, ... ...I find that the results do not match, if I do so. ...that if I am looking at a balanced panel, I get the correct results when multiplying Q with the already demeaned series y_it, Q%*%y_it. ...that if I am looking at an unbalanced panel, I receive results which differ (even though being close). I guess I am missing something. Every comment pointing me to the right solution would be of great value to me. Also, comments on how to increase the efficiency of my function would help! Please find an example based on the Grunfeld data below. Thank you very much! Philipp Grueber ########################## library(MASS) getQ<-function(object,t.index,i.index){ t.ind<-object[,t.index] i.ind<-object[,i.index] nam<-unique(i.ind) tim<-unique(t.ind) n<-nrow(object) N<-length(nam) T<-length(tim) I<-matrix(0,n,n) I[row(I)==col(I)] <- 1 I_N<-matrix(0,N,N) I_N[row(I_N)==col(I_N)] <- 1 I_T<-matrix(0,T,T) I_T[row(I_T)==col(I_T)] <- 1 D1<-data.frame() for(t in tim){ D1<-rbind(D1,I_N) } D1<-data.matrix(D1[as.vector(table(i.ind,t.ind))==1,]) D2<-data.frame() for(i in nam){ D2<-rbind(D2,I_T) } D2<-data.matrix(D2[as.vector(table(t.ind,i.ind))==1,]) D<-data.matrix(cbind(D1,D2)) Q<-I-D%*%ginv(crossprod(D))%*%t(D)#fQ(D1)-fQ(D1)%*%D2%*%ginv(t(D2)%*%fQ(D1)%*%D2)%*%t(D2)%*%fQ(D1) Q } ############################## library(plm) library(lmtest) data(Grunfeld) y_i<-Grunfeld$inv-ave(Grunfeld$inv,index=Grunfeld$firm) y_t<-Grunfeld$inv-ave(Grunfeld$inv,index=Grunfeld$year) y_it<-(Grunfeld$inv-ave(Grunfeld$inv,index=Grunfeld$firm)-ave(Grunfeld$inv,index=Grunfeld$year)+rep(mean(Grunfeld$inv),length(Grunfeld$inv))) x1_it<-(Grunfeld$value-ave(Grunfeld$value,index=Grunfeld$firm)-ave(Grunfeld$value,index=Grunfeld$year)+rep(mean(Grunfeld$value),length(Grunfeld$inv))) dem_y_i<-pmodel.response(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="individual")) dem_y_t<-pmodel.response(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="time")) dem_y_it<-pmodel.response(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="twoways")) dem_X_it<-model.matrix(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="twoways")) sum(y_i!=dem_y_i) #y_i[1:10] #dem_y_i[1:10] sum(y_t!=dem_y_t) #y_t[1:10] #dem_y_t[1:10] sum(y_it!=dem_y_it) #y_it[1:10] #dem_y_it[1:10] sum(x1_it!=dem_X_it) #x1_it[1:10] #dem_X_it[1:10,] (getQ(Grunfeld,t.index="year",i.index="firm")%*%Grunfeld$inv)[1:10] (getQ(Grunfeld,t.index="year",i.index="firm")%*%y_it)[1:10] dem_y_it[1:10] ############################ Grunfeld1<-Grunfeld[-1,] sum(ave(Grunfeld1$inv,index=Grunfeld1$firm)!=(tapply(Grunfeld1$inv,Grunfeld1$firm,function(x){mean(x)})[Grunfeld1$firm]))==0 y_i<-Grunfeld1$inv-ave(Grunfeld1$inv,index=Grunfeld1$firm) y_t<-Grunfeld1$inv-ave(Grunfeld1$inv,index=Grunfeld1$year) y_it<-(Grunfeld1$inv-ave(Grunfeld1$inv,index=Grunfeld1$firm)-ave(Grunfeld1$inv,index=Grunfeld1$year)+rep(mean(Grunfeld1$inv),length(Grunfeld1$inv))) x1_it<-(Grunfeld1$value-ave(Grunfeld1$value,index=Grunfeld1$firm)-ave(Grunfeld1$value,index=Grunfeld1$year)+rep(mean(Grunfeld1$value),length(Grunfeld1$inv))) dem_y_i<-pmodel.response(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="individual")) dem_y_t<-pmodel.response(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="time")) dem_y_it<-pmodel.response(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="twoways")) dem_X_it<-model.matrix(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="twoways")) sum(y_i!=dem_y_i) #y_i[1:10] #dem_y_i[1:10] sum(y_t!=dem_y_t) #y_t[1:10] #dem_y_t[1:10] sum(y_it!=dem_y_it) #y_it[1:10] #dem_y_it[1:10] #y_it[1:10]-dem_y_it[1:10] sum(x1_it!=dem_X_it) #x1_it[1:10] #dem_X_it[1:10,] #x1_it[1:10]-dem_X_it[1:10,] (getQ(Grunfeld1,t.index="year",i.index="firm")%*%Grunfeld1$inv)[1:10] (getQ(Grunfeld1,t.index="year",i.index="firm")%*%y_it)[1:10] dem_y_it[1:10] ----- ____________________________________ EBS Universitaet fuer Wirtschaft und Recht FARE Department Wiesbaden/ Germany http://www.ebs.edu/index.php?id=finacc&L=0 -- View this message in context: http://r.789695.n4.nabble.com/Manual-two-way-demeaning-of-unbalanced-panel-data-Wansbeek-Kapteyn-transformation-tp4655202.html Sent from the R help mailing list archive at Nabble.com.
Possibly Parallel Threads
- Unexpected result with lag() et diff() in plm package.
- lme with corAR1 errors - can't find AR coefficient in output
- panel regression with twoways random effects, on unbalanced data?
- singular matrices in plm::pgmm()
- How can I extract part of the data in a panel dataset?