thr3ads.net - R help - [R] influence measures for multivariate linear models [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Michael Friendly

2010-Aug-10 14:57 UTC

[R] influence measures for multivariate linear models

Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general classes 
of influence measures for multivariate
regression models, including analogs of Cook's D, Andrews & Pregibon 
COVRATIO, etc.  As in univariate
response models, these are based on leverage and residuals based on 
omitting one (or more) observations at
a time and refitting, although, in the univariate case, the computations 
can be optimized, as they are in
stats::influence() and related methods.

I'm interested in exploring the multivariate extension in R.  I tried 
the following, and was surprised to find that
R returned a result rather than an error -- presumably because mlm 
objects are not trapped before they
get to lm.influence()

 > # multivariate model
 > data(Rohwer, package="heplots")
 > rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, 
data=Rohwer)

 > names(influence(rohwer.mod))
[1] "hat"          "coefficients" "sigma"       
"wt.res"
 > head(influence(rohwer.mod)$coefficients, 2)
        [,1]       [,2]      [,3]     [,4]      [,5]      [,6]
[1,] 2.25039  0.0254739 -0.025252 -0.06297 -0.121507  0.094355
[2,] 0.84649 -0.0062656 -0.077430  0.08345 -0.022579 -0.059480
 >

Of course, the correct calculations would result from refitting, 
omitting each observation in turn, though doing this
directly would be horribly inefficient.
e.g, calculating B(i), deleting case i:

 > coef(update(rohwer.mod, subset=1:69 !=1, data=Rohwer))
                  SAT     PPVT     Raven
(Intercept) -2.466079 35.68664 11.510068
n            1.888286  0.60949  0.075931
s           -0.034524 -0.53040  0.160328
ns          -2.739834 -0.67355  0.066392
na           2.219340  1.20481 -0.037272
ss           1.072300  0.99033  0.058509
 > coef(update(rohwer.mod, subset=1:69 !=2, data=Rohwer))
                  SAT     PPVT      Raven
(Intercept) -1.062178 33.88199 10.8988006
n            1.920026  0.59735  0.0713976
s            0.017654 -0.47464  0.1774135
ns          -2.886254 -0.67905  0.0673686
na           2.120411  1.29016 -0.0077484
ss           1.226135  0.96430  0.0471764

Is there anything existing for this case that I've missed, or does 
anyone have an interest in pursuing this topic?

-Michael

-- 
Michael Friendly     Email: friendly AT yorku DOT ca 
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

Peter Dalgaard

2010-Aug-10 16:20 UTC

head link

[R] influence measures for multivariate linear models

Michael Friendly wrote:> Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general classes
> of influence measures for multivariate
> regression models, including analogs of Cook's D, Andrews &
Pregibon
> COVRATIO, etc.  As in univariate
> response models, these are based on leverage and residuals based on 
> omitting one (or more) observations at
> a time and refitting, although, in the univariate case, the computations 
> can be optimized, as they are in
> stats::influence() and related methods.
> 
> I'm interested in exploring the multivariate extension in R.  I tried 
> the following, and was surprised to find that
> R returned a result rather than an error -- presumably because mlm 
> objects are not trapped before they
> get to lm.influence()
> 
>  > # multivariate model
>  > data(Rohwer, package="heplots")
>  > rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, 
> data=Rohwer)
> 
>  > names(influence(rohwer.mod))
> [1] "hat"          "coefficients" "sigma"    
"wt.res"
>  > head(influence(rohwer.mod)$coefficients, 2)
>         [,1]       [,2]      [,3]     [,4]      [,5]      [,6]
> [1,] 2.25039  0.0254739 -0.025252 -0.06297 -0.121507  0.094355
> [2,] 0.84649 -0.0062656 -0.077430  0.08345 -0.022579 -0.059480
>  >
> 
> Of course, the correct calculations would result from refitting, 
> omitting each observation in turn, though doing this
> directly would be horribly inefficient.
> e.g, calculating B(i), deleting case i:
> 
>  > coef(update(rohwer.mod, subset=1:69 !=1, data=Rohwer))
>                   SAT     PPVT     Raven
> (Intercept) -2.466079 35.68664 11.510068
> n            1.888286  0.60949  0.075931
> s           -0.034524 -0.53040  0.160328
> ns          -2.739834 -0.67355  0.066392
> na           2.219340  1.20481 -0.037272
> ss           1.072300  0.99033  0.058509
>  > coef(update(rohwer.mod, subset=1:69 !=2, data=Rohwer))
>                   SAT     PPVT      Raven
> (Intercept) -1.062178 33.88199 10.8988006
> n            1.920026  0.59735  0.0713976
> s            0.017654 -0.47464  0.1774135
> ns          -2.886254 -0.67905  0.0673686
> na           2.120411  1.29016 -0.0077484
> ss           1.226135  0.96430  0.0471764
> 
> Is there anything existing for this case that I've missed, or does 
> anyone have an interest in pursuing this topic?
Hmm, fitted coefficients in this sort multivariate models are the same
as those in the univariate ones, so as long as you do whole-case
deletions, I would think that you should be able to reuse the 1D code. I
would conjecture that the main problem with what you currently get is
that it only pertains to the 1st column -- looks like the differences
between the two rows from lm.influence matches the differences between
the first two colums from coef(update(...)).

Since lm() only handles complete cases, casewise deletion diagnostics is
probably the best you can get, otherwise it would be interesting to see
the effect of deleting each coordinate separately.

(As you know, these matters are within my general sphere of interest,
but I'm afraid my time is too constrained at them moment for more than a
sideline view.)
> 
> -Michael
> 

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Aug 2010 - influence measures for multivariate linear models

[R] influence measures for multivariate linear models

[R] influence measures for multivariate linear models

Maybe Matching Threads