Cecilia Carmo
2013-Apr-03 08:38 UTC
[R] linear model coefficients by year and industry, fitted values, residuals, panel data
Hi R-helpers, My real data is a panel (unbalanced and with gaps in years) of thousands of firms, by year and industry, and with financial information (variables X, Y, Z, for example), the number of firms by year and industry is not always equal, the number of years by industry is not always equal. #reproducible example firm1<-sort(rep(1:10,5),decreasing=F) year1<-rep(2000:2004,10) industry1<-rep(20,50) X<-rnorm(50) Y<-rnorm(50) Z<-rnorm(50) data1<-data.frame(firm1,year1,industry1,X,Y,Z) data1 colnames(data1)<-c("firm","year","industry","X","Y","Z") firm2<-sort(rep(11:15,3),decreasing=F) year2<-rep(2001:2003,5) industry2<-rep(30,15) X<-rnorm(15) Y<-rnorm(15) Z<-rnorm(15) data2<-data.frame(firm2,year2,industry2,X,Y,Z) data2 colnames(data2)<-c("firm","year","industry","X","Y","Z") firm3<-sort(rep(16:20,4),decreasing=F) year3<-rep(2001:2004,5) industry3<-rep(40,20) X<-rnorm(20) Y<-rnorm(20) Z<-rnorm(20) data3<-data.frame(firm3,year3,industry3,X,Y,Z) data3 colnames(data3)<-c("firm","year","industry","X","Y","Z") final1<-rbind(data1,data2) final2<-rbind(final1,data3) final2 final3<-final2[order(final2$industry,final2$year),] final3 I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year, to obtain the estimates of b0, b1 and b2 by industry and year (for example I need to have de b0 for industry 20 and year 2000, for industry 20 and year 2001...). Then I need to calculate the fitted values and the residuals by firm so I need to keep b0, b1 and b2 in a way that I could do something like newdata1<-transform(final3,Y'=b0+b1.X+b2.Z) newdata2<-transform(newdata1,residual=Y-Y') or another way to keep Y' and the residuals in a dataframe with the columns firm and year. Until now I have been doing this in very hard way and because I need to do it several times, I need your help to get an easier way. Thank you, Cecília Carmo Universidade de Aveiro Portugal [[alternative HTML version deleted]]
Adams, Jean
2013-Apr-03 12:41 UTC
[R] linear model coefficients by year and industry, fitted values, residuals, panel data
Cecilia, Thanks for providing a reproducible example. Excellent. You could use the ddply() function in the plyr package to fit the model for each industry and year, keep the coefficients, and then estimate the fitted and residual values. Jean library(plyr) coef <- ddply(final3, .(industry, year), function(dat) lm(Y ~ X + Z, data=dat)$coef) names(coef) <- c("industry", "year", "b0", "b1", "b2") final4 <- merge(final3, coef) newdata1 <- transform(final4, Yhat = b0 + b1*X + b2*Z) newdata2 <- transform(newdata1, residual = Y-Yhat) plot(as.factor(newdata2$firm), newdata2$residual) On Wed, Apr 3, 2013 at 3:38 AM, Cecilia Carmo <cecilia.carmo@ua.pt> wrote:> Hi R-helpers, > > > > My real data is a panel (unbalanced and with gaps in years) of thousands > of firms, by year and industry, and with financial information (variables > X, Y, Z, for example), the number of firms by year and industry is not > always equal, the number of years by industry is not always equal. > > > > #reproducible example > firm1<-sort(rep(1:10,5),decreasing=F) > year1<-rep(2000:2004,10) > industry1<-rep(20,50) > X<-rnorm(50) > Y<-rnorm(50) > Z<-rnorm(50) > data1<-data.frame(firm1,year1,industry1,X,Y,Z) > data1 > colnames(data1)<-c("firm","year","industry","X","Y","Z") > > > > firm2<-sort(rep(11:15,3),decreasing=F) > year2<-rep(2001:2003,5) > industry2<-rep(30,15) > X<-rnorm(15) > Y<-rnorm(15) > Z<-rnorm(15) > data2<-data.frame(firm2,year2,industry2,X,Y,Z) > data2 > colnames(data2)<-c("firm","year","industry","X","Y","Z") > > firm3<-sort(rep(16:20,4),decreasing=F) > year3<-rep(2001:2004,5) > industry3<-rep(40,20) > X<-rnorm(20) > Y<-rnorm(20) > Z<-rnorm(20) > data3<-data.frame(firm3,year3,industry3,X,Y,Z) > data3 > colnames(data3)<-c("firm","year","industry","X","Y","Z") > > > > final1<-rbind(data1,data2) > final2<-rbind(final1,data3) > final2 > final3<-final2[order(final2$industry,final2$year),] > final3 > > > > I need to estimate a linear model Y = b0 + b1X + b2Z by industry and year, > to obtain the estimates of b0, b1 and b2 by industry and year (for example > I need to have de b0 for industry 20 and year 2000, for industry 20 and > year 2001...). Then I need to calculate the fitted values and the residuals > by firm so I need to keep b0, b1 and b2 in a way that I could do something > like > newdata1<-transform(final3,Y'=b0+b1.X+b2.Z) > newdata2<-transform(newdata1,residual=Y-Y') > or another way to keep Y' and the residuals in a dataframe with the > columns firm and year. > > > > Until now I have been doing this in very hard way and because I need to do > it several times, I need your help to get an easier way. > > > > Thank you, > > > > Cecília Carmo > > Universidade de Aveiro > > Portugal > > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]