Clara Yuan
2009-Dec-17 22:14 UTC
[R] Testing equality of regression model on multiple groups
Hello, I'm trying to test for the joint equality of coefficients of the same model across different subsets of the data (ie, is it necessary to estimate the same model on these different populations, or can I just estimate the model once on the whole dataset?). My plan is to use the F-test on the reduced model and the full model. By full model, I mean a specification that mimics my regressions on separate subsets of data, but I have found that the full model's coefficient estimates don't correspond to my original model's estimates. I was under the impression that they would be identical. Original model:> lm.separate = by(data.ex, data.ex$t, function(x) lm(y ~ x1 * x2, data = x))Full model:> lm.together = lm(y ~ t * x1 * x2, data = data.ex)The data are grouped by t. When I examine the coefficients, I find that they are roughly in the same ballpark, but not nearly identical:> sapply(lm.separate, coef)1 2 3 4 (Intercept) 2.691272263 1.7153565472 1.8797914048 1.9282332240 x1 0.107520721 0.0472488208 0.0440171489 0.0198376096 x2 0.054694784 0.0396246366 0.0603665574 0.0300886164 x1:x2 0.002180749 0.0003653858 -0.0001488267 -0.0007409421> coef(lm.together)(Intercept) t.L t.Q t.C x1 2.0536633597 -0.4750933962 0.5121787674 -0.2809269719 0.0546560750 x2 t.L:x1 t.Q:x1 t.C:x1 t.L:x2 0.0461936485 -0.0595422428 0.0180461803 -0.0174386682 -0.0118682844 t.Q:x2 t.C:x2 x1:x2 t.L:x1:x2 t.Q:x1:x2 -0.0076038969 -0.0194162097 0.0004140914 -0.0020749112 0.0006116237 t.C:x1:x2 -0.0003083657 (Also, why are the coefficients renamed to t.L, t.Q, etc instead of t.1, t.2?) What am I missing? Thanks for the help, Clara
Daniel Malter
2009-Dec-18 10:22 UTC
[R] Testing equality of regression model on multiple groups
Hi, your question is unclear. It is not clear whether you want to compare a.) whether two subsets of data have the same coefficients for an identical model or b.) whether a couple of coefficients are different once you include additional regressors. The reason for this confusion is that your lm.separate and lm.together are not the same models (i.e., they do not include the same regressors). The former is (y~x1*x2); the latter is (y~t*x1*x2), which is obviously different. If a.) is your goal, you want to run a Chow test. For that, you run the regression (y~x1*x2) on the entire dataset and on the two subsets of the data separately. If the model on the entire data produces much larger errors (in sum) than the two regressions on the subsets of data (in sum), then this is evidence that the data-generating processes for the two subsets differ significantly. In other words, the coefficients for the subsets will jointly be significantly different from each other. This is, as you said, an F-Test. Look here for further details: http://en.wikipedia.org/wiki/Chow_test It just beckons me that you probably want to compare multiple groups. If you want to run (y~x1*x2) interacted with the group indicator, i.e. (y~group*x1*x2), you should code your group indicator as "treatment contrasts." The t.L, t.Q, and t.C indicate a.) that t is a factor variable and b.) that this factor variable is coded as orthogonal polynomial contrasts. This tests for linear, quadractic, cubic, etc. influence of t on the dependent variable. This makes sense if t has constant differences between the factor levels (like 10 cm, 15 cm, 20 cm, 25 cm). Otherwise, you probably want dummy-variable coding, which is called "treatment contrasts" in R lingo. HTH, Daniel ------------------------- cuncta stricte discussurus ------------------------- -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Clara Yuan Sent: Thursday, December 17, 2009 5:15 PM To: r-help at lists.R-project.org Subject: [R] Testing equality of regression model on multiple groups Hello, I'm trying to test for the joint equality of coefficients of the same model across different subsets of the data (ie, is it necessary to estimate the same model on these different populations, or can I just estimate the model once on the whole dataset?). My plan is to use the F-test on the reduced model and the full model. By full model, I mean a specification that mimics my regressions on separate subsets of data, but I have found that the full model's coefficient estimates don't correspond to my original model's estimates. I was under the impression that they would be identical. Original model:> lm.separate = by(data.ex, data.ex$t, function(x) lm(y ~ x1 * x2, data x))Full model:> lm.together = lm(y ~ t * x1 * x2, data = data.ex)The data are grouped by t. When I examine the coefficients, I find that they are roughly in the same ballpark, but not nearly identical:> sapply(lm.separate, coef)1 2 3 4 (Intercept) 2.691272263 1.7153565472 1.8797914048 1.9282332240 x1 0.107520721 0.0472488208 0.0440171489 0.0198376096 x2 0.054694784 0.0396246366 0.0603665574 0.0300886164 x1:x2 0.002180749 0.0003653858 -0.0001488267 -0.0007409421> coef(lm.together)(Intercept) t.L t.Q t.C x1 2.0536633597 -0.4750933962 0.5121787674 -0.2809269719 0.0546560750 x2 t.L:x1 t.Q:x1 t.C:x1 t.L:x2 0.0461936485 -0.0595422428 0.0180461803 -0.0174386682 -0.0118682844 t.Q:x2 t.C:x2 x1:x2 t.L:x1:x2 t.Q:x1:x2 -0.0076038969 -0.0194162097 0.0004140914 -0.0020749112 0.0006116237 t.C:x1:x2 -0.0003083657 (Also, why are the coefficients renamed to t.L, t.Q, etc instead of t.1, t.2?) What am I missing? Thanks for the help, Clara ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2009-Dec-18 13:30 UTC
[R] Testing equality of regression model on multiple groups
On Dec 17, 2009, at 5:14 PM, Clara Yuan wrote:> Hello, > >> > (Also, why are the coefficients renamed to t.L, t.Q, etc instead of > t.1, t.2?)You have coded your t variable as an ordered factor. ?ordered if you do not want the linear, quadratic, cubic coefficients for the polynomial contrasts, then remove the ordering with: t <- factor(t, is.ordered=FALSE) ... and rerun the analysis. -- David> > What am I missing? > > Thanks for the help, > Clara > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT