Philipp Grueber
2013-Jan-11  02:20 UTC
[R] Manual two-way demeaning of unbalanced panel data (Wansbeek/Kapteyn transformation)
Dear R users,
I wish to manually demean a panel over time and entities. I tried to code
the Wansbeek and Kapteyn (1989) transformation (from Baltagi's book Ch. 9).
As a benchmark I use both the pmodel.response() and model.matrix() functions
in package plm and the results from using dummy variables. As far as I
understood the transformation (Ch.3), Q%*%y (with y being the dependent
variable) should yield the demeaned series.
However, ...
		...I find that the results do not match, if I do so.
		...that if I am looking at a balanced panel, I get the correct results
when multiplying Q with the already demeaned series y_it, Q%*%y_it.
		...that if I am looking at an unbalanced panel, I receive results which
differ (even though being close). 
I guess I am missing something. Every comment pointing me to the right
solution would be of great value to me. Also, comments on how to increase
the efficiency of my function would help!
Please find an example based on the Grunfeld data below.
Thank you very much!
Philipp Grueber
##########################
library(MASS)
getQ<-function(object,t.index,i.index){
	t.ind<-object[,t.index]
	i.ind<-object[,i.index]
	nam<-unique(i.ind)
	tim<-unique(t.ind)
	n<-nrow(object)
	N<-length(nam)
	T<-length(tim)
	I<-matrix(0,n,n)
	I[row(I)==col(I)] <- 1
	I_N<-matrix(0,N,N)
	I_N[row(I_N)==col(I_N)] <- 1
	I_T<-matrix(0,T,T)
	I_T[row(I_T)==col(I_T)] <- 1
	D1<-data.frame()
	for(t in tim){
		D1<-rbind(D1,I_N)
	}
	D1<-data.matrix(D1[as.vector(table(i.ind,t.ind))==1,])
	D2<-data.frame()
	for(i in nam){
		D2<-rbind(D2,I_T)
	}
	D2<-data.matrix(D2[as.vector(table(t.ind,i.ind))==1,])
	D<-data.matrix(cbind(D1,D2))
Q<-I-D%*%ginv(crossprod(D))%*%t(D)#fQ(D1)-fQ(D1)%*%D2%*%ginv(t(D2)%*%fQ(D1)%*%D2)%*%t(D2)%*%fQ(D1)
	Q
}
##############################
  
library(plm)
library(lmtest)
data(Grunfeld)
y_i<-Grunfeld$inv-ave(Grunfeld$inv,index=Grunfeld$firm)
y_t<-Grunfeld$inv-ave(Grunfeld$inv,index=Grunfeld$year)
y_it<-(Grunfeld$inv-ave(Grunfeld$inv,index=Grunfeld$firm)-ave(Grunfeld$inv,index=Grunfeld$year)+rep(mean(Grunfeld$inv),length(Grunfeld$inv)))
x1_it<-(Grunfeld$value-ave(Grunfeld$value,index=Grunfeld$firm)-ave(Grunfeld$value,index=Grunfeld$year)+rep(mean(Grunfeld$value),length(Grunfeld$inv)))
dem_y_i<-pmodel.response(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="individual"))
dem_y_t<-pmodel.response(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="time"))
dem_y_it<-pmodel.response(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="twoways"))
dem_X_it<-model.matrix(plm(formula=inv~value,data=Grunfeld,index=c("firm","year"),model="within",effect="twoways"))
sum(y_i!=dem_y_i)
#y_i[1:10]
#dem_y_i[1:10]
sum(y_t!=dem_y_t)
#y_t[1:10]
#dem_y_t[1:10]
sum(y_it!=dem_y_it)
#y_it[1:10]
#dem_y_it[1:10]
sum(x1_it!=dem_X_it)
#x1_it[1:10]
#dem_X_it[1:10,]
(getQ(Grunfeld,t.index="year",i.index="firm")%*%Grunfeld$inv)[1:10]
(getQ(Grunfeld,t.index="year",i.index="firm")%*%y_it)[1:10]
dem_y_it[1:10]
############################
Grunfeld1<-Grunfeld[-1,]
sum(ave(Grunfeld1$inv,index=Grunfeld1$firm)!=(tapply(Grunfeld1$inv,Grunfeld1$firm,function(x){mean(x)})[Grunfeld1$firm]))==0
y_i<-Grunfeld1$inv-ave(Grunfeld1$inv,index=Grunfeld1$firm)
y_t<-Grunfeld1$inv-ave(Grunfeld1$inv,index=Grunfeld1$year)
y_it<-(Grunfeld1$inv-ave(Grunfeld1$inv,index=Grunfeld1$firm)-ave(Grunfeld1$inv,index=Grunfeld1$year)+rep(mean(Grunfeld1$inv),length(Grunfeld1$inv)))
x1_it<-(Grunfeld1$value-ave(Grunfeld1$value,index=Grunfeld1$firm)-ave(Grunfeld1$value,index=Grunfeld1$year)+rep(mean(Grunfeld1$value),length(Grunfeld1$inv)))
dem_y_i<-pmodel.response(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="individual"))
dem_y_t<-pmodel.response(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="time"))
dem_y_it<-pmodel.response(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="twoways"))
dem_X_it<-model.matrix(plm(formula=inv~value,data=Grunfeld1,index=c("firm","year"),model="within",effect="twoways"))
sum(y_i!=dem_y_i)
#y_i[1:10]
#dem_y_i[1:10]
sum(y_t!=dem_y_t)
#y_t[1:10]
#dem_y_t[1:10]
sum(y_it!=dem_y_it)
#y_it[1:10]
#dem_y_it[1:10]
#y_it[1:10]-dem_y_it[1:10]
sum(x1_it!=dem_X_it)
#x1_it[1:10]
#dem_X_it[1:10,]
#x1_it[1:10]-dem_X_it[1:10,]
(getQ(Grunfeld1,t.index="year",i.index="firm")%*%Grunfeld1$inv)[1:10]
(getQ(Grunfeld1,t.index="year",i.index="firm")%*%y_it)[1:10]
dem_y_it[1:10]
-----
____________________________________
EBS Universitaet fuer Wirtschaft und Recht
FARE Department
Wiesbaden/ Germany
http://www.ebs.edu/index.php?id=finacc&L=0
--
View this message in context:
http://r.789695.n4.nabble.com/Manual-two-way-demeaning-of-unbalanced-panel-data-Wansbeek-Kapteyn-transformation-tp4655202.html
Sent from the R help mailing list archive at Nabble.com.
Possibly Parallel Threads
- Unexpected result with lag() et diff() in plm package.
- lme with corAR1 errors - can't find AR coefficient in output
- panel regression with twoways random effects, on unbalanced data?
- singular matrices in plm::pgmm()
- How can I extract part of the data in a panel dataset?
