Michael Friendly
2010-Sep-14 22:10 UTC
[R] influence measures for multivariate linear models
I'm following up on a question I posted 8/10/2010, but my newsreader has lost this thread.> Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general > classes of influence measures for multivariate > regression models, including analogs of Cook's D, Andrews & Pregibon > COVRATIO, etc. As in univariate > response models, these are based on leverage and residuals based on > omitting one (or more) observations at > a time and refitting, although, in the univariate case, the > computations can be optimized, as they are in > stats::influence() and related methods. > > I'm interested in exploring the multivariate extension in R. I tried > the following, and was surprised to find that > R returned a result rather than an error -- presumably because mlm > objects are not trapped before they > get to lm.influence() >I've done a bit more testing, comparing what I get from R lm functions with some direct calculations in both R and SAS, and now conclude that there are bugs in lm.influence and the coef(update()) methods I tried before to get leave-one-out coefficients for multivariate response models fit with lm(). By bugs, I mean that results returned are silently wrong, or at best, misleading. My test case: a subset of the Rohwer data anayzed by Barrett & Ling (& others) > library(heplots) > data(Rohwer, package="heplots") > Rohwer1 <- Rohwer[Rohwer$SES=='Hi',] > rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer1) > (B <- coef(rohwer.mod)) SAT PPVT Raven (Intercept) -28.467471 39.6970896 13.2438356 n 3.257132 0.0672825 0.0593474 s 2.996583 0.3699843 0.4924440 ns -5.859063 -0.3743802 -0.1640219 na 5.666223 1.5230090 0.1189805 ss -0.622653 0.4101567 -0.1211564 Attempt to get the leave-one-out coefficients, B(-i), for the first 2 cases from influence(): *wrong* -- influence() should probably issue stop('Not defined for mlm objects'), or the documentation should be made clearer for what happens in this case. > head(influence(rohwer.mod, do.coef=TRUE)$coefficients, 2) [,1] [,2] [,3] [,4] [,5] [,6] [1,] -5.56830 0.0510135 -0.486096 0.61811 0.0585384 -0.1339219 [2,] 2.30754 0.1092886 0.388349 -0.40717 -0.0600899 0.0477736 The correct result, for the first case is > # direct calculation > X <- cbind(1, as.matrix(Rohwer1[,6:10])) > Y <- as.matrix(Rohwer1[,3:5]) > keep <- 2:n > (B1 <- solve(t(X[keep,]) %*% X[keep,]) %*% t(X[keep,]) %*% Y[keep,]) SAT PPVT Raven -22.899170 41.7757793 13.5057156 n 3.206119 0.0482388 0.0569482 s 3.482679 0.5514477 0.5153053 ns -6.477172 -0.6051252 -0.1930919 na 5.607684 1.5011562 0.1162274 ss -0.488731 0.4601508 -0.1148580 OK, ?influence tells me that the 'coefficients' returned are actually B-B(-i), and I can see that it gives those for the first response variable (SAT) > B-B1 SAT PPVT Raven (Intercept) -5.5683009 -2.0786897 -0.26187994 n 0.0510135 0.0190437 0.00239919 s -0.4860961 -0.1814634 -0.02286134 ns 0.6181096 0.2307451 0.02907000 na 0.0585384 0.0218528 0.00275309 ss -0.1339219 -0.0499941 -0.00629841 For SAT, same as re-fitting a univariate model: > rohwer.inf1 <- influence(lm(SAT ~ n + s + ns + na + ss, data=Rohwer1), do.coef=TRUE) > colnames(rohwer.inf1$coefficients) <- paste("SAT", colnames(rohwer.inf1$coefficients), sep=":") > head(rohwer.inf1$coefficients, 2) SAT:(Intercept) SAT:n SAT:s SAT:ns SAT:na SAT:ss 38 -5.56830 0.0510135 -0.486096 0.61811 0.0585384 -0.1339219 39 2.30754 0.1092886 0.388349 -0.40717 -0.0600899 0.0477736 > But I'm looking for a method I can generalize. I also tried the following, using subset= in update() and lm(), giving a different wrong answer. (Am I using the subset= argument correctly?) > coef(update(rohwer.mod, subset=1:n !=1, data=Rohwer1)) SAT PPVT Raven (Intercept) -28.428163 51.762326 11.0052852 n 0.912497 -0.316177 0.1294643 s 5.478358 0.280319 0.6639180 ns -6.119679 -1.828516 -0.2995350 na 5.383479 2.096793 0.1940462 ss -0.345764 0.448199 -0.0725977 Warning message: In 1:n : numerical expression has 32 elements: only the first used So, how can I calculate leave-one-out coefficients (and other quantities) efficiently for mlm-s? -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA