Michael Friendly
2010-Aug-10 14:57 UTC
[R] influence measures for multivariate linear models
Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general classes of influence measures for multivariate regression models, including analogs of Cook's D, Andrews & Pregibon COVRATIO, etc. As in univariate response models, these are based on leverage and residuals based on omitting one (or more) observations at a time and refitting, although, in the univariate case, the computations can be optimized, as they are in stats::influence() and related methods. I'm interested in exploring the multivariate extension in R. I tried the following, and was surprised to find that R returned a result rather than an error -- presumably because mlm objects are not trapped before they get to lm.influence() > # multivariate model > data(Rohwer, package="heplots") > rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, data=Rohwer) > names(influence(rohwer.mod)) [1] "hat" "coefficients" "sigma" "wt.res" > head(influence(rohwer.mod)$coefficients, 2) [,1] [,2] [,3] [,4] [,5] [,6] [1,] 2.25039 0.0254739 -0.025252 -0.06297 -0.121507 0.094355 [2,] 0.84649 -0.0062656 -0.077430 0.08345 -0.022579 -0.059480 > Of course, the correct calculations would result from refitting, omitting each observation in turn, though doing this directly would be horribly inefficient. e.g, calculating B(i), deleting case i: > coef(update(rohwer.mod, subset=1:69 !=1, data=Rohwer)) SAT PPVT Raven (Intercept) -2.466079 35.68664 11.510068 n 1.888286 0.60949 0.075931 s -0.034524 -0.53040 0.160328 ns -2.739834 -0.67355 0.066392 na 2.219340 1.20481 -0.037272 ss 1.072300 0.99033 0.058509 > coef(update(rohwer.mod, subset=1:69 !=2, data=Rohwer)) SAT PPVT Raven (Intercept) -1.062178 33.88199 10.8988006 n 1.920026 0.59735 0.0713976 s 0.017654 -0.47464 0.1774135 ns -2.886254 -0.67905 0.0673686 na 2.120411 1.29016 -0.0077484 ss 1.226135 0.96430 0.0471764 Is there anything existing for this case that I've missed, or does anyone have an interest in pursuing this topic? -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
Peter Dalgaard
2010-Aug-10 16:20 UTC
[R] influence measures for multivariate linear models
Michael Friendly wrote:> Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general classes > of influence measures for multivariate > regression models, including analogs of Cook's D, Andrews & Pregibon > COVRATIO, etc. As in univariate > response models, these are based on leverage and residuals based on > omitting one (or more) observations at > a time and refitting, although, in the univariate case, the computations > can be optimized, as they are in > stats::influence() and related methods. > > I'm interested in exploring the multivariate extension in R. I tried > the following, and was surprised to find that > R returned a result rather than an error -- presumably because mlm > objects are not trapped before they > get to lm.influence() > > > # multivariate model > > data(Rohwer, package="heplots") > > rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, > data=Rohwer) > > > names(influence(rohwer.mod)) > [1] "hat" "coefficients" "sigma" "wt.res" > > head(influence(rohwer.mod)$coefficients, 2) > [,1] [,2] [,3] [,4] [,5] [,6] > [1,] 2.25039 0.0254739 -0.025252 -0.06297 -0.121507 0.094355 > [2,] 0.84649 -0.0062656 -0.077430 0.08345 -0.022579 -0.059480 > > > > Of course, the correct calculations would result from refitting, > omitting each observation in turn, though doing this > directly would be horribly inefficient. > e.g, calculating B(i), deleting case i: > > > coef(update(rohwer.mod, subset=1:69 !=1, data=Rohwer)) > SAT PPVT Raven > (Intercept) -2.466079 35.68664 11.510068 > n 1.888286 0.60949 0.075931 > s -0.034524 -0.53040 0.160328 > ns -2.739834 -0.67355 0.066392 > na 2.219340 1.20481 -0.037272 > ss 1.072300 0.99033 0.058509 > > coef(update(rohwer.mod, subset=1:69 !=2, data=Rohwer)) > SAT PPVT Raven > (Intercept) -1.062178 33.88199 10.8988006 > n 1.920026 0.59735 0.0713976 > s 0.017654 -0.47464 0.1774135 > ns -2.886254 -0.67905 0.0673686 > na 2.120411 1.29016 -0.0077484 > ss 1.226135 0.96430 0.0471764 > > Is there anything existing for this case that I've missed, or does > anyone have an interest in pursuing this topic?Hmm, fitted coefficients in this sort multivariate models are the same as those in the univariate ones, so as long as you do whole-case deletions, I would think that you should be able to reuse the 1D code. I would conjecture that the main problem with what you currently get is that it only pertains to the 1st column -- looks like the differences between the two rows from lm.influence matches the differences between the first two colums from coef(update(...)). Since lm() only handles complete cases, casewise deletion diagnostics is probably the best you can get, otherwise it would be interesting to see the effect of deleting each coordinate separately. (As you know, these matters are within my general sphere of interest, but I'm afraid my time is too constrained at them moment for more than a sideline view.)> > -Michael >-- Peter Dalgaard Center for Statistics, Copenhagen Business School Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com