Philipp Grueber
2013-Nov-30 14:03 UTC
[R] Calculate Adjusted vcov Matrix acc. to Shanken(1992) (Generated Regressor Problem)
Dear R Users, I wish to estimate a regression: y=a+b x1+c x2+d x3+e where a is the constant, b,c,d are coefficients and e represents the residuals. However, I find x1 and x2 to correlate. In order to avoid multicollinearity, I split up the estimation: (1) x2~a1+ b1 x1 + e1 (2) y=a2+b2 x1+ c2 e1+ d2 x3+e2 I believe that using the residuals from regression (1) in (2) induces an ?generated regressor problem? which biases the standard errors obtained (I am not quite sure because both x1 and the error e1 both are included in my second regression). (Shanken1992) provides an adjustment which however refers to a panel setup. Obviously, my setup is unrelated to panels. Question: If being applicable to my setup at all, is there an R implementation of Shanken?s adjustment? See my example below. I try to replicate the explanation provided in Ch.12 of John Chohcrane's "Asset Pricing" book (page 237ff). Any help is highly appreciated. Thank you very much in advance! Best wishes, Philipp PS: Note that a similar question was asked by someone else only few days ago in the stackexchange network. See http://stats.stackexchange.com/questions/77617/shanken-1992-correction-for-t-statistics ################################################################## ################################################################## ################################################################## #packages library(lmtest) #Data y<-rnorm(100) x1<-rnorm(100) x2<-rnorm(100) x3<-rnorm(100) #I wish to estimate y=c+x1+x2+x3+u but cor(x1,x2) is high. Therefore twostep procedure: (1) x2~c1+x1+u1 (2) y=c2+x1+u1+x3+u2. res1<-lm(x2~x1) coeftest(res1,vcov.=vcov(res1)) vcov(res1) #I am able to calculate this manually. X1<-as.matrix(data.frame(A=rep(1,100),B=x1),stringsAsFactors=FALSE) betas<-solve(crossprod(X1,X1))%*%t(X1)%*%x2 round(coef(res1),digits=10)==round(betas,digits=10) betas u1<-resid(res1) n1<-length(y) k1<-ncol(X1) vcov1 = 1/(n1-k1) * as.numeric(t(u1)%*%u1) * solve(t(X1)%*%X1) #= 1/T vcov1 #Now I do my second regression: Y<-as.matrix(y) X2<-as.matrix(data.frame(A=1,B=x1,C=u1,D=x3)) res2<-lm(y~x1+u1+x3) coeftest(res2,vcov.=vcov(res2)) vcov(res2) #This is the (biased) variance-covariance matrix: u2<-resid(res2) n2<-length(y) k2<-ncol(X2) vcov2 = 1/(n2-k2) * as.numeric(t(u2)%*%u2) * solve(t(X2)%*%X2) #= 1/T vcov2 ############## ############## # So far, this was easy, but in order to implement the Shanken(1992) adjustment, I need to calculate the variance-covariance matrix manually in the following way: (See Chohcrane (2005) Asset Pricing Ch.12): var_coef<-function(x){ x_mm<-model.matrix(x) x_s2<-summary(x)$sigma^2 solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(x_s2*diag(nobs(x)))%*%x_mm%*%solve(t(x_mm)%*%x_mm) } vcov_manual<-var_coef(x=res1) vcov1 round(vcov1,digits=10)==round(vcov_manual,digits=10) ################################## # This is where I begin to question whether Shanken's adjustment makes sense in my setup at all. ################################## #I calculate the covariance matrix of the residuals cov(u,u') (note: I am not sure what it actually tells me / whether it makes sense in my setup) cov_resid<-function(x){ x_mm<-model.matrix(x) x_s2<-summary(x)$sigma^2 (diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))%*%(x_s2*diag(nobs(x)))%*%t(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm)) } cov_resid<-cov_resid(x=res1) cov_resid[95:100,95:100] #to show only a part #Another way to go to: H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1) ((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100] round(((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100],digits=10)==round(cov_resid[95:100,95:100],digits=10) ################################## # This is where I really struggle. I do not know how to get cov(x1,x1'). ################################## #The error covariance of the second regression should be H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1) sigma<-((diag(100)-H1)*summary(res1)$sigma^2) sigma_f<-"???" #Since the variance-covariance matrix of the second regression should be vcov2=1/T*((b'b)^(-1)b'Sb(b'b)-sigma_f) I assume: x=X2 sigma2_f<-(solve(t(x)%*%x)%*%t(x)%*%(sigma)%*%x%*%solve(t(x)%*%x))-vcov2*100 #But then the error covariance cannot be calculated as in Chochrane: err_cov<-1/100*(coef(res1)%*%sigma2_f%*%coef(res1)+sigma) ################################## # And then? ################################## #Given I was able to derive sigma and sigma_f correctly (and that they made sense), I could then include the Shanken adjustment: ...*(1+t(lambdas)%*%solve(sig_f)%*%lambdas)... var_coef_adj<-function(x,sig,sig_f){ lambdas<-coef(x) x_mm<-model.matrix(x) x_s2<-summary(x)$sigma^2 1/100*(solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(sig)%*%x_mm%*%solve(t(x_mm)%*%x_mm) *(1+t(lambdas)%*%solve(sig_f)%*%lambdas) +sig_f) } #...and thus, hope to obtain the corrected variance-covariance matrix for the second regression vcov2_adj<-var_coef_adj(x=res2, sig=sigma, sig_f=sigma2_f) ################################################################## ################################################################## ################################################################## ----- ____________________________________ EBS Universitaet fuer Wirtschaft und Recht FARE Department Wiesbaden/ Germany http://www.ebs.edu/index.php?id=finacc&L=0 -- View this message in context: http://r.789695.n4.nabble.com/Calculate-Adjusted-vcov-Matrix-acc-to-Shanken-1992-Generated-Regressor-Problem-tp4681404.html Sent from the R help mailing list archive at Nabble.com.
Philipp Grueber
2013-Dec-04 13:43 UTC
[R] Calculate Adjusted vcov Matrix acc. to Shanken(1992) (Generated Regressor Problem)
Dear R Users, please find attached what I believe to be the solution to my problem. Note that I am still not 100% sure if my approach really does what it is intended to do and if it is applicable to my case at all. Any comment or correction is highly appreciated. Best wishes, Philipp ################################################################## ################################################################## ################################################################## #packages library(lmtest) #Data y<-rnorm(100) x1<-rnorm(100) x2<-x1+rnorm(100)/5 x3<-rnorm(100) #Simulated multicollinearity in the data plot(x1,x2) cor(x1,x2) #I wish to estimate y=c+x1+x2+x3+u but cor(x1,x2) is high. Therefore twostep procedure: (1) x2~c1+x1+u1 (2) y=c2+x1+u1+x3+u2. res1<-lm(x2~x1) coeftest(res1,vcov.=vcov(res1)) vcov(res1) #I am able to calculate this manually. X1<-as.matrix(data.frame(A=rep(1,100),B=x1),stringsAsFactors=FALSE) betas<-solve(crossprod(X1,X1))%*%t(X1)%*%x2 round(coef(res1),digits=10)==round(betas,digits=10) betas u1<-resid(res1) n1<-length(y) k1<-ncol(X1) vcov1 = 1/(n1-k1) * as.numeric(t(u1)%*%u1) * solve(t(X1)%*%X1) #= 1/T vcov1 #Now I do my second regression: Y<-as.matrix(y) X2<-as.matrix(data.frame(A=1,B=x1,C=u1,D=x3)) res2<-lm(y~x1+u1+x3) coeftest(res2,vcov.=vcov(res2)) vcov(res2) #This is the (biased) variance-covariance matrix: u2<-resid(res2) n2<-length(y) k2<-ncol(X2) vcov2 = 1/(n2-k2) * as.numeric(t(u2)%*%u2) * solve(t(X2)%*%X2) #= 1/T vcov2 #In order to implement the Shanken(1992) adjustment, I need to calculate the variance-covariance matrix manually in the following way: (See Chochrane (2005) Asset Pricing Ch.12): var_coef<-function(x){ x_mm<-model.matrix(x) x_s2<-summary(x)$sigma^2 solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(x_s2*diag(nobs(x)))%*%x_mm%*%solve(t(x_mm)%*%x_mm) } vcov_manual<-var_coef(x=res1) vcov1 round(vcov1,digits=10)==round(vcov_manual,digits=10) #I calculate the covariance matrix of the residuals cov(u,u'). cov_resid<-function(x){ x_mm<-model.matrix(x) x_s2<-summary(x)$sigma^2 (diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm))%*%(x_s2*diag(nobs(x)))%*%t(diag(nobs(x))-x_mm%*%solve(t(x_mm)%*%x_mm)%*%t(x_mm)) } cov_resid<-cov_resid(x=res1) cov_resid[95:100,95:100] #to show only a part #Another way to go to: H1<-X1%*%solve(t(X1)%*%X1)%*%t(X1) ((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100] #Seems to be correct round(((diag(100)-H1)*summary(res1)$sigma^2)[95:100,95:100],digits=10)==round(cov_resid[95:100,95:100],digits=10) #Now I include the Shanken adjustment: ...*(1+t(lambdas)%*%solve(sig_f)%*%lambdas)... var_coef_adj<-function(x_1,x_2){ lambdas<-coef(x_2) x_mm<-model.matrix(x_2) x_s<-((diag(100)-H1)*summary(x_1)$sigma^2) x_s_f<-vcov(x_2) 1/1*(solve(t(x_mm)%*%x_mm)%*%t(x_mm)%*%(x_s)%*%x_mm%*%solve(t(x_mm)%*%x_mm)*as.numeric(1+t(lambdas)%*%solve(x_s_f)%*%lambdas)+x_s_f) } vcov2_adj<-var_coef_adj(x_1=res1, x_2=res2) var_coef_adj(x_1=res1, x_2=res2) coeftest(res2) coeftest(res2,vcov.=vcov2_adj) ----- ____________________________________ EBS Universitaet fuer Wirtschaft und Recht FARE Department Wiesbaden/ Germany http://www.ebs.edu/index.php?id=finacc&L=0 -- View this message in context: http://r.789695.n4.nabble.com/Calculate-Adjusted-vcov-Matrix-acc-to-Shanken-1992-Generated-Regressor-Problem-tp4681404p4681631.html Sent from the R help mailing list archive at Nabble.com.