Patrick McKann
2011-Feb-07 21:37 UTC
[R] Combining the results from two simple linear regression models
Hi all. This is more of a stats question, I suppose. Let's say I have two separate simple regressions of weight on year from two different datasets. I want to combine the regressions so that I can come up with a single equation for the total weight regressed on year. In reality, there is missing data, so I can't just sum the data across datasets and come up with a regression on the summed data. Below is a program to reproduce what I am trying to figure out. ### aslp=50 bslp=-50 sda=20 sdb=100 yrs=0:10 a= rnorm(11,100,sda)+aslp*yrs b= rnorm (11,1000,sdb)+bslp*yrs ma=lm(a~yrs) mb=lm(b~yrs) pra=predict(ma,data.frame(yrs=yrs),interval='confidence') prb=predict(mb,data.frame(yrs=yrs),interval='confidence') ##combine the two regressions for a single equation with confidence intervals pr=pra+prb###it couldn't be this simple, could it? #by hand co=coef(ma)+coef(mb) new.sigma=sqrt(summary(ma)$sigma^2+summary(mb)$sigma^2) fit=co[1]+co[2]*yrs lwr= fit - qt(.975,9)*sqrt( new.sigma^2 * ( (1/11) + ( (yrs-mean(yrs))^2)/sum((yrs-mean(yrs))^2) ) )#the df are probably wrong (the 9 in the qt upr= fit + qt(.975,9)*sqrt( new.sigma^2 * ( (1/11) + ( (yrs-mean(yrs))^2)/sum((yrs-mean(yrs))^2) ) )# statement) I can't print the graph here, so here's code for it... #graph plot(a,ylim=c(0,1500)) points(b) lines(ma$fit) lines(mb$fit) lines(pra[,'lwr'],lty=2);lines(pra[,'upr'],lty=2) lines(prb[,'lwr'],lty=2);lines(prb[,'upr'],lty=2) lines(pr[,'fit'],col='grey') lines(pr[,'upr'],lty=2,col='grey') lines(pr[,'lwr'],lty=2,col='grey') lines(fit,col='grey',lty=3) lines(lwr,col='grey',lty=3) lines(upr,col='grey',lty=3) legend(1,1500,lty=c(2,3),legend=c('adding using predict','adding by hand'),col='grey') As you can see, the plots of the two different methods for producing the confidence intervals don't match. What I suspect is wrong in my by-hand formula is the degrees of freedom (and n (the 11)), but if anything I think those would increase, which would make the confidence intervals even smaller than plainly summing the confidence intervals provided by the predict function at each x. I'm just wondering how to do this correctly, and am not sure if either approach is correct. If anybody would like to weigh in, I would appreciate it. Please feel free to point out the obvious. Thanks, Patrick [[alternative HTML version deleted]]