thr3ads.net - R help - [R] Diag "Hat" matrix [Jun 2001]

If this information is useful, please help other people find it:
Share via:

Kenneth Cabrera

2001-Jun-07 15:51 UTC

[R] Diag "Hat" matrix

Hi R users:

What is the difference between in the computation of the diag of the
"hat" matrix in:
"lm.influence" and the matrix operations with "solve()" and
"t()"?

I mean, this is my X matrix

        x1    x2    x3    x4     x5
 [1,] 0.297 0.310 0.290 0.220 0.1560
 [2,] 0.360 0.390 0.369 0.297 0.2050
 [3,] 0.075 0.058 0.047 0.034 0.0230
 [4,] 0.114 0.100 0.081 0.058 0.0420
 [5,] 0.229 0.213 0.198 0.142 0.0102
 [6,] 0.315 0.304 0.267 0.202 0.1470
 [7,] 0.477 0.518 0.496 0.395 0.2850
 [8,] 0.072 0.063 0.047 0.036 0.0240
 [9,] 0.099 0.092 0.074 0.056 0.0038
[10,] 0.420 0.452 0.425 0.332 0.2350
[11,] 0.189 0.178 0.153 0.107 0.0760
[12,] 0.369 0.391 0.364 0.286 0.2000
[13,] 0.142 0.124 0.105 0.077 0.0560
[14,] 0.094 0.087 0.072 0.049 0.0320
[15,] 0.171 0.161 0.145 0.094 0.0680
[16,] 0.378 0.420 0.380 0.281 0.2000

If I use:
diag(X%*%solve(t(X)%*%X)%*%t(X))
I obtain:
 [1] 0.15248181 0.27102872 0.11476375 0.12941386 0.90455886 0.32246292
 [7] 0.43858581 0.16533854 0.37415984 0.19100227 0.17023090 0.15125134
[13] 0.17855019 0.06023773 0.52137996 0.85455350

But when I use the lm.influence() function
lm.influence(mt)$hat
I obtain:
 [1] 0.1735989 0.2999146 0.2334095 0.1455117 0.9216644 0.7553856
0.4486403
 [8] 0.2755802 0.4188349 0.1914242 0.1790093 0.1573939 0.1787553
0.1975511
[15] 0.5664988 0.8568274
mt is a model of the type y~x1+x2+x3+x4+x5, where y is:
y
[1]  17  17  35  69  69 173 173  17  17  73  17  35  69  35  35  52

As you see the differences are no too small.
Where is the problem? Is only a numerical stability problem?

Thank you very much for your help


Kenneth Cabrera
krcabrer at perseus.unalmed.edu.co
krcabrer at epm.net.co
Universidad Nacional de Colombia
Sede Medellin

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian Ripley

2001-Jun-07 16:00 UTC

head link

[R] Diag "Hat" matrix

On Thu, 7 Jun 2001, Kenneth Cabrera wrote:
> Hi R users:
>
> What is the difference between in the computation of the diag of the
> "hat" matrix in:
> "lm.influence" and the matrix operations with "solve()"
and "t()"?
None.  You have forgotten the intercept: the design matrix for

y~x1+x2+x3+x4+x5

has a first column of ones.  Try using cbind(1, X).

[...]

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  stats.ox.ac.uk/~ripley
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard BSA

2001-Jun-07 16:11 UTC

head link

[R] Diag "Hat" matrix

Kenneth Cabrera <krcabrer at perseus.unalmed.edu.co> writes:
> Hi R users:
> 
> What is the difference between in the computation of the diag of the
> "hat" matrix in:
> "lm.influence" and the matrix operations with "solve()"
and "t()"?
> 
> I mean, this is my X matrix
> 
>         x1    x2    x3    x4     x5
>  [1,] 0.297 0.310 0.290 0.220 0.1560
...> [16,] 0.378 0.420 0.380 0.281 0.2000
> 
> If I use:
> diag(X%*%solve(t(X)%*%X)%*%t(X))
> I obtain:
>  [1] 0.15248181 0.27102872 0.11476375 0.12941386 0.90455886 0.32246292
>  [7] 0.43858581 0.16533854 0.37415984 0.19100227 0.17023090 0.15125134
> [13] 0.17855019 0.06023773 0.52137996 0.85455350
> 
> But when I use the lm.influence() function
> lm.influence(mt)$hat
> I obtain:
>  [1] 0.1735989 0.2999146 0.2334095 0.1455117 0.9216644 0.7553856
> 0.4486403
>  [8] 0.2755802 0.4188349 0.1914242 0.1790093 0.1573939 0.1787553
> 0.1975511
> [15] 0.5664988 0.8568274
> mt is a model of the type y~x1+x2+x3+x4+x5, where y is:
> y
> [1]  17  17  35  69  69 173 173  17  17  73  17  35  69  35  35  52
> 
> As you see the differences are no too small.
> Where is the problem? Is only a numerical stability problem?
> 
> Thank you very much for your help
The intercept? What if you use X2<-cbind(1,X) or y~x1+x2+x3+x4+x5-1

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Douglas Bates

2001-Jun-07 16:19 UTC

head link

[R] Diag "Hat" matrix

Kenneth Cabrera <krcabrer at perseus.unalmed.edu.co> writes:
> What is the difference between in the computation of the diag of the
> "hat" matrix in:
> "lm.influence" and the matrix operations with "solve()"
and "t()"?
> I mean, this is my X matrix
>         x1    x2    x3    x4     x5
>  [1,] 0.297 0.310 0.290 0.220 0.1560
>  [2,] 0.360 0.390 0.369 0.297 0.2050
>  [3,] 0.075 0.058 0.047 0.034 0.0230
>  [4,] 0.114 0.100 0.081 0.058 0.0420
>  [5,] 0.229 0.213 0.198 0.142 0.0102
>  [6,] 0.315 0.304 0.267 0.202 0.1470
>  [7,] 0.477 0.518 0.496 0.395 0.2850
>  [8,] 0.072 0.063 0.047 0.036 0.0240
>  [9,] 0.099 0.092 0.074 0.056 0.0038
> [10,] 0.420 0.452 0.425 0.332 0.2350
> [11,] 0.189 0.178 0.153 0.107 0.0760
> [12,] 0.369 0.391 0.364 0.286 0.2000
> [13,] 0.142 0.124 0.105 0.077 0.0560
> [14,] 0.094 0.087 0.072 0.049 0.0320
> [15,] 0.171 0.161 0.145 0.094 0.0680
> [16,] 0.378 0.420 0.380 0.281 0.2000
If your model formula is y~x1+x2+x3+x4+x5 then the model matrix would
have a column of 1's followed by what you wrote.  Try defining X as
 X <- model.matrix(y~x1+x2+x3+x4+x5, data = <whatever>)
and re-doing your calculation.

In general, please note that
> diag(X%*%solve(t(X)%*%X)%*%t(X))
is a *very* bad way of calculating the influences.  Whenever you find
yourself writing something like solve(t(X)%*%X) you're doing the
numerical linear algebra badly.  The rule of thumb in numerical
linear algebra is that if the best way you can think of doing a
calculation involves taking the inverse of a matrix then you need to
think more about the calculation.

The way that lm.influence calculates the influences is through an
orthogonal-triangular decomposition which will be both faster and more
accurate than what you have written.

On today's machines "faster" doesn't matter that much for
small or
moderate sized data sets.  On very large data sets faster still does
matter. "Accurate" should always be a concern.
> I obtain:
>  [1] 0.15248181 0.27102872 0.11476375 0.12941386 0.90455886 0.32246292
>  [7] 0.43858581 0.16533854 0.37415984 0.19100227 0.17023090 0.15125134
> [13] 0.17855019 0.06023773 0.52137996 0.85455350
> 
> But when I use the lm.influence() function
> lm.influence(mt)$hat
> I obtain:
>  [1] 0.1735989 0.2999146 0.2334095 0.1455117 0.9216644 0.7553856
> 0.4486403
>  [8] 0.2755802 0.4188349 0.1914242 0.1790093 0.1573939 0.1787553
> 0.1975511
> [15] 0.5664988 0.8568274
> mt is a model of the type y~x1+x2+x3+x4+x5, where y is:
> y
> [1]  17  17  35  69  69 173 173  17  17  73  17  35  69  35  35  52
> 
> As you see the differences are no too small.
> Where is the problem? Is only a numerical stability problem?-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

R help - Jun 2001 - Diag "Hat" matrix

[R] Diag "Hat" matrix

[R] Diag "Hat" matrix

[R] Diag "Hat" matrix

[R] Diag "Hat" matrix