Philipp Grueber
2013-Nov-30 14:03 UTC
[R] Calculate Adjusted vcov Matrix acc. to Shanken(1992) (Generated Regressor Problem)
Dear R Users,
I wish to estimate a regression:
y=a+b x1+c x2+d x3+e
where a is the constant, b,c,d are coefficients and e represents the
residuals. However, I find x1 and x2 to correlate. In order to avoid
multicollinearity, I split up the estimation:
(1) x2~a1+ b1 x1 + e1
(2) y=a2+b2 x1+ c2 e1+ d2 x3+e2
I believe that using the residuals from regression (1) in (2) induces an
?generated regressor problem? which biases the standard errors obtained (I
am not quite sure because both x1 and the error e1 both are included in my
second regression). (Shanken1992) provides an adjustment which however
refers to a panel setup. Obviously, my setup is unrelated to panels.
Question:
If being applicable to my setup at all, is there an R implementation of
Shanken?s adjustment? See my example below. I try to replicate the
explanation provided in Ch.12 of John Chohcrane's "Asset Pricing"
book (page
237ff).
Any help is highly appreciated. Thank you very much in advance!
Best wishes,
Philipp
PS: Note that a similar question was asked by someone else only few days ago
in the stackexchange network. See
http://stats.stackexchange.com/questions/77617/shanken-1992-correction-for-t-statistics
##################################################################
##################################################################
##################################################################
#packages
library(lmtest)
#Data
y<-rnorm(100)
x1<-rnorm(100)
x2<-rnorm(100)
x3<-rnorm(100)
#I wish to estimate y=c+x1+x2+x3+u but cor(x1,x2) is high. Therefore twostep
procedure: (1) x2~c1+x1+u1 (2) y=c2+x1+u1+x3+u2.
res1<-lm(x2~x1)
coeftest(res1,vcov.=vcov(res1))
vcov(res1)
#I am able to calculate this manually.
X1<-as.matrix(data.frame(A=rep(1,100),B=x1),stringsAsFactors=FALSE)
betas<-solve(crossprod(X1,X1))%*%t(X1)%*%x2
round(coef(res1),digits=10)==round(betas,digits=10)
betas
u1<-resid(res1)
n1<-length(y)
k1<-ncol(X1)
vcov1 = 1/(n1-k1) * as.numeric(t(u1)%*%u1) * solve(t(X1)%*%X1) #= 1/T
vcov1
#Now I do my second regression:
Y<-as.matrix(y)
X2<-as.matrix(data.frame(A=1,B=x1,C=u1,D=x3))
res2<-lm(y~x1+u1+x3)
coeftest(res2,vcov.=vcov(res2))
vcov(res2)
#This is the (biased) variance-covariance matrix:
u2<-resid(res2)
n2<-length(y)
k2<-ncol(X2)
vcov2 = 1/(n2-k2) * as.numeric(t(u2)%*%u2) * solve(t(X2)%*%X2) #= 1/T
vcov2
##############
##############
# So far, this was easy, but in order to implement the Shanken(1992)
adjustment, I need to calculate the variance-covariance matrix manually in
the following way: (See Chohcrane (2005) Asset Pricing Ch.12):
var_coef<-function(x){
x_mm<-model.matrix(x)
x_s2<-summary(x)$sigma^2
solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(x_s2*diag(nobs(x)))%*%x_mm%*%solve(t(x_mm)%*%x_mm)
}
vcov_manual<-var_coef(x=res1)
vcov1
round(vcov1,digits=10)==round(vcov_manual,digits=10)
##################################
# This is where I begin to question whether Shanken's adjustment makes sense
in my setup at all.
##################################
#I calculate the covariance matrix of the residuals cov(u,u') (note: I am
not sure what it actually tells me / whether it makes sense in my setup)
cov_resid<-function(x){
x_mm<-model.matrix(x)
x_s2<-summary(x)$sigma^2
(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))%*%(x_s2*diag(nobs(x)))%*%t(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))
}
cov_resid<-cov_resid(x=res1)
cov_resid[95:100,95:100] #to show only a part
#Another way to go to:
H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1)
((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100]
round(((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100],digits=10)==round(cov_resid[95:100,95:100],digits=10)
##################################
# This is where I really struggle. I do not know how to get cov(x1,x1').
##################################
#The error covariance of the second regression should be
H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1)
sigma<-((diag(100)-H1)*summary(res1)$sigma^2)
sigma_f<-"???"
#Since the variance-covariance matrix of the second regression should be
vcov2=1/T*((b'b)^(-1)b'Sb(b'b)-sigma_f) I assume:
x=X2
sigma2_f<-(solve(t(x)%*%x)%*%t(x)%*%(sigma)%*%x%*%solve(t(x)%*%x))-vcov2*100
#But then the error covariance cannot be calculated as in Chochrane:
err_cov<-1/100*(coef(res1)%*%sigma2_f%*%coef(res1)+sigma)
##################################
# And then?
##################################
#Given I was able to derive sigma and sigma_f correctly (and that they made
sense), I could then include the Shanken adjustment:
...*(1+t(lambdas)%*%solve(sig_f)%*%lambdas)...
var_coef_adj<-function(x,sig,sig_f){
lambdas<-coef(x)
x_mm<-model.matrix(x)
x_s2<-summary(x)$sigma^2
1/100*(solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(sig)%*%x_mm%*%solve(t(x_mm)%*%x_mm)
*(1+t(lambdas)%*%solve(sig_f)%*%lambdas) +sig_f)
}
#...and thus, hope to obtain the corrected variance-covariance matrix for
the second regression
vcov2_adj<-var_coef_adj(x=res2, sig=sigma, sig_f=sigma2_f)
##################################################################
##################################################################
##################################################################
-----
____________________________________
EBS Universitaet fuer Wirtschaft und Recht
FARE Department
Wiesbaden/ Germany
http://www.ebs.edu/index.php?id=finacc&L=0
--
View this message in context:
http://r.789695.n4.nabble.com/Calculate-Adjusted-vcov-Matrix-acc-to-Shanken-1992-Generated-Regressor-Problem-tp4681404.html
Sent from the R help mailing list archive at Nabble.com.
Philipp Grueber
2013-Dec-04 13:43 UTC
[R] Calculate Adjusted vcov Matrix acc. to Shanken(1992) (Generated Regressor Problem)
Dear R Users,
please find attached what I believe to be the solution to my problem. Note
that I am still not 100% sure if my approach really does what it is intended
to do and if it is applicable to my case at all.
Any comment or correction is highly appreciated.
Best wishes,
Philipp
##################################################################
##################################################################
##################################################################
#packages
library(lmtest)
#Data
y<-rnorm(100)
x1<-rnorm(100)
x2<-x1+rnorm(100)/5
x3<-rnorm(100)
#Simulated multicollinearity in the data
plot(x1,x2)
cor(x1,x2)
#I wish to estimate y=c+x1+x2+x3+u but cor(x1,x2) is high. Therefore twostep
procedure: (1) x2~c1+x1+u1 (2) y=c2+x1+u1+x3+u2.
res1<-lm(x2~x1)
coeftest(res1,vcov.=vcov(res1))
vcov(res1)
#I am able to calculate this manually.
X1<-as.matrix(data.frame(A=rep(1,100),B=x1),stringsAsFactors=FALSE)
betas<-solve(crossprod(X1,X1))%*%t(X1)%*%x2
round(coef(res1),digits=10)==round(betas,digits=10)
betas
u1<-resid(res1)
n1<-length(y)
k1<-ncol(X1)
vcov1 = 1/(n1-k1) * as.numeric(t(u1)%*%u1) * solve(t(X1)%*%X1) #= 1/T
vcov1
#Now I do my second regression:
Y<-as.matrix(y)
X2<-as.matrix(data.frame(A=1,B=x1,C=u1,D=x3))
res2<-lm(y~x1+u1+x3)
coeftest(res2,vcov.=vcov(res2))
vcov(res2)
#This is the (biased) variance-covariance matrix:
u2<-resid(res2)
n2<-length(y)
k2<-ncol(X2)
vcov2 = 1/(n2-k2) * as.numeric(t(u2)%*%u2) * solve(t(X2)%*%X2) #= 1/T
vcov2
#In order to implement the Shanken(1992) adjustment, I need to calculate the
variance-covariance matrix manually in the following way: (See Chochrane
(2005) Asset Pricing Ch.12):
var_coef<-function(x){
x_mm<-model.matrix(x)
x_s2<-summary(x)$sigma^2
solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(x_s2*diag(nobs(x)))%*%x_mm%*%solve(t(x_mm)%*%x_mm)
}
vcov_manual<-var_coef(x=res1)
vcov1
round(vcov1,digits=10)==round(vcov_manual,digits=10)
#I calculate the covariance matrix of the residuals cov(u,u').
cov_resid<-function(x){
x_mm<-model.matrix(x)
x_s2<-summary(x)$sigma^2
(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))%*%(x_s2*diag(nobs(x)))%*%t(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))
}
cov_resid<-cov_resid(x=res1)
cov_resid[95:100,95:100] #to show only a part
#Another way to go to:
H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1)
((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100]
#Seems to be correct
round(((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100],digits=10)==round(cov_resid[95:100,95:100],digits=10)
#Now I include the Shanken adjustment:
...*(1+t(lambdas)%*%solve(sig_f)%*%lambdas)...
var_coef_adj<-function(x_1,x_2){
lambdas<-coef(x_2)
x_mm<-model.matrix(x_2)
x_s<-((diag(100)-H1)*summary(x_1)$sigma^2)
x_s_f<-vcov(x_2)
1/1*(solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(x_s)%*%x_mm%*%solve(t(x_mm)%*%x_mm)*as.numeric(1+t(lambdas)%*%solve(x_s_f)%*%lambdas)+x_s_f)
}
vcov2_adj<-var_coef_adj(x_1=res1, x_2=res2)
var_coef_adj(x_1=res1, x_2=res2)
coeftest(res2)
coeftest(res2,vcov.=vcov2_adj)
-----
____________________________________
EBS Universitaet fuer Wirtschaft und Recht
FARE Department
Wiesbaden/ Germany
http://www.ebs.edu/index.php?id=finacc&L=0
--
View this message in context:
http://r.789695.n4.nabble.com/Calculate-Adjusted-vcov-Matrix-acc-to-Shanken-1992-Generated-Regressor-Problem-tp4681404p4681631.html
Sent from the R help mailing list archive at Nabble.com.