I'd like to model the relationship between m responses Y1, ..., Ym and a single set of predictor variables X1, ..., Xr. Each response is assumed to follow its own regression model, and the error terms in each model can be correlated. My understanding is that although lm() handles vector Y's on the left-hand side of the model formula, it really just fits m separate lm models. What should I use to do a full multivariate analysis (as in section 7.7 of Johnson/Wichern)? Thanks. Iain Pardoe <ipardoe at lcbmail.uoregon.edu> University of Oregon
Iain Pardoe said the following on 2005-05-04 20:08:> I'd like to model the relationship between m responses Y1, ..., Ym and a > single set of predictor variables X1, ..., Xr. Each response is assumed > to follow its own regression model, and the error terms in each model > can be correlated. My understanding is that although lm() handles > vector Y's on the left-hand side of the model formula, it really just > fits m separate lm models. What should I use to do a full multivariate > analysis (as in section 7.7 of Johnson/Wichern)? Thanks.Have you tried using the `lm' function? Note that R 2.1.0 added several useful functions for multivariate responses. Let's replicate Johnson & Wichern's Example 7.8 (p. 413, in the 4th Ed.) using `lm': > ex7.8 <- data.frame(z1 = c(0, 1, 2, 3, 4), + y1 = c(1, 4, 3, 8, 9), + y2 = c(-1, -1, 2, 3, 2)) > > f.mlm <- lm(cbind(y1, y2) ~ z1, data = ex7.8) > summary(f.mlm) Response y1 : Call: lm(formula = y1 ~ z1, data = ex7.8) Residuals: 1 2 3 4 5 6.880e-17 1.000e+00 -2.000e+00 1.000e+00 -1.326e-16 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.0000 1.0954 0.913 0.4286 z1 2.0000 0.4472 4.472 0.0208 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.414 on 3 degrees of freedom Multiple R-Squared: 0.8696, Adjusted R-squared: 0.8261 F-statistic: 20 on 1 and 3 DF, p-value: 0.02084 Response y2 : Call: lm(formula = y2 ~ z1, data = ex7.8) Residuals: 1 2 3 4 5 9.931e-17 -1.000e+00 1.000e+00 1.000e+00 -1.000e+00 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.0000 0.8944 -1.118 0.3450 z1 1.0000 0.3651 2.739 0.0714 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.155 on 3 degrees of freedom Multiple R-Squared: 0.7143, Adjusted R-squared: 0.619 F-statistic: 7.5 on 1 and 3 DF, p-value: 0.07142 > SSD(f.mlm) $SSD y1 y2 y1 6 -2 y2 -2 4 $call lm(formula = cbind(y1, y2) ~ z1, data = ex7.8) $df [1] 3 attr(,"class") [1] "SSD" > f.mlm1 <- update(f.mlm, ~ 1) > anova(f.mlm, f.mlm1) Analysis of Variance Table Model 1: cbind(y1, y2) ~ z1 Model 2: cbind(y1, y2) ~ 1 Res.Df Df Gen.var. Pillai approx F num Df den Df Pr(>F) 1 3 1.4907 2 4 1 4.4721 0.9375 15.0000 2 2 0.0625 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 HTH, Henric
Thanks Henric. If I may I'd like to go a little further ... For example, Johnson and Wichern's example 7.10:> ex7.10 <-+ data.frame(y1 = c(141.5, 168.9, 154.8, 146.5, 172.8, 160.1, 108.5), + y2 = c(301.8, 396.1, 328.2, 307.4, 362.4, 369.5, 229.1), + z1 = c(123.5, 146.1, 133.9, 128.5, 151.5, 136.2, 92), + z2 = c(2.108, 9.213, 1.905, .815, 1.061, 8.603, 1.125))> attach(ex7.10) > f.mlm <- lm(cbind(y1,y2)~z1+z2) > y.hat <- c(1, 130, 7.5) %*% coef(f.mlm) > round(y.hat, 2)y1 y2 [1,] 151.84 349.63> qf.z <- t(c(1, 130, 7.5)) %*%+ solve(t(cbind(1,z1,z2)) %*% cbind(1,z1,z2)) %*% + c(1, 130, 7.5)> round(qf.z, 5)[,1] [1,] 0.36995> n.sigma.hat <- SSD(f.mlm)$SSD > round(n.sigma.hat, 2)y1 y2 y1 5.80 5.22 y2 5.22 12.57> F.quant <- qf(.95,2,3) > round(F.quant, 2)[1] 9.55 This gives me all the information I need to calculate a 95% confidence ellipse for y=(y1,y2) at (z1,z2)=(130,7.5) using JW's equation (7-46): (y-y.hat) %*% ((n-r-1) * solve(n.sigma.hat)) %*% t(y-y.hat) <= qf.z * (m*(n-r-1)/(n-r-m)) * F.quant But, what if instead I'd like to sample (y1,y2) values from this distribution? I can sample from an F(m,n-r-m) distribution easily enough, but then how do I transform this to a single point in (y1,y2) space? Any ideas would be gratefully appreciated. Thanks.
On Wednesday 04 May 2005 20:08, Iain Pardoe wrote:> I'd like to model the relationship between m responses Y1, ..., Ym and a > single set of predictor variables X1, ..., Xr. Each response is assumed > to follow its own regression model, and the error terms in each model > can be correlated. My understanding is that although lm() handles > vector Y's on the left-hand side of the model formula, it really just > fits m separate lm models. What should I use to do a full multivariate > analysis (as in section 7.7 of Johnson/Wichern)? Thanks.I don't know Johnson/Wichern. However, it seems to me that you want to do a "seemingly unrelated regression" (SUR). You can do this with the package "systemfit". Arne> Iain Pardoe <ipardoe at lcbmail.uoregon.edu> > University of Oregon > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html-- Arne Henningsen Department of Agricultural Economics University of Kiel Olshausenstr. 40 D-24098 Kiel (Germany) Tel: +49-431-880 4445 Fax: +49-431-880 1397 ahenningsen at agric-econ.uni-kiel.de http://www.uni-kiel.de/agrarpol/ahenningsen/
Apparently Analagous Threads
- summary(object, test=c("Roy", "Wilks", "Pillai", ....) AND ellipse(object, center=....)
- multivariate regression and lm()
- Sampling from multivariate multiple regression prediction regions
- Multivariate response methods question
- passing an extra argument to an S3 generic