Hi R users: What is the difference between in the computation of the diag of the "hat" matrix in: "lm.influence" and the matrix operations with "solve()" and "t()"? I mean, this is my X matrix x1 x2 x3 x4 x5 [1,] 0.297 0.310 0.290 0.220 0.1560 [2,] 0.360 0.390 0.369 0.297 0.2050 [3,] 0.075 0.058 0.047 0.034 0.0230 [4,] 0.114 0.100 0.081 0.058 0.0420 [5,] 0.229 0.213 0.198 0.142 0.0102 [6,] 0.315 0.304 0.267 0.202 0.1470 [7,] 0.477 0.518 0.496 0.395 0.2850 [8,] 0.072 0.063 0.047 0.036 0.0240 [9,] 0.099 0.092 0.074 0.056 0.0038 [10,] 0.420 0.452 0.425 0.332 0.2350 [11,] 0.189 0.178 0.153 0.107 0.0760 [12,] 0.369 0.391 0.364 0.286 0.2000 [13,] 0.142 0.124 0.105 0.077 0.0560 [14,] 0.094 0.087 0.072 0.049 0.0320 [15,] 0.171 0.161 0.145 0.094 0.0680 [16,] 0.378 0.420 0.380 0.281 0.2000 If I use: diag(X%*%solve(t(X)%*%X)%*%t(X)) I obtain: [1] 0.15248181 0.27102872 0.11476375 0.12941386 0.90455886 0.32246292 [7] 0.43858581 0.16533854 0.37415984 0.19100227 0.17023090 0.15125134 [13] 0.17855019 0.06023773 0.52137996 0.85455350 But when I use the lm.influence() function lm.influence(mt)$hat I obtain: [1] 0.1735989 0.2999146 0.2334095 0.1455117 0.9216644 0.7553856 0.4486403 [8] 0.2755802 0.4188349 0.1914242 0.1790093 0.1573939 0.1787553 0.1975511 [15] 0.5664988 0.8568274 mt is a model of the type y~x1+x2+x3+x4+x5, where y is: y [1] 17 17 35 69 69 173 173 17 17 73 17 35 69 35 35 52 As you see the differences are no too small. Where is the problem? Is only a numerical stability problem? Thank you very much for your help Kenneth Cabrera krcabrer at perseus.unalmed.edu.co krcabrer at epm.net.co Universidad Nacional de Colombia Sede Medellin -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thu, 7 Jun 2001, Kenneth Cabrera wrote:> Hi R users: > > What is the difference between in the computation of the diag of the > "hat" matrix in: > "lm.influence" and the matrix operations with "solve()" and "t()"?None. You have forgotten the intercept: the design matrix for y~x1+x2+x3+x4+x5 has a first column of ones. Try using cbind(1, X). [...] -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Kenneth Cabrera <krcabrer at perseus.unalmed.edu.co> writes:> Hi R users: > > What is the difference between in the computation of the diag of the > "hat" matrix in: > "lm.influence" and the matrix operations with "solve()" and "t()"? > > I mean, this is my X matrix > > x1 x2 x3 x4 x5 > [1,] 0.297 0.310 0.290 0.220 0.1560...> [16,] 0.378 0.420 0.380 0.281 0.2000 > > If I use: > diag(X%*%solve(t(X)%*%X)%*%t(X)) > I obtain: > [1] 0.15248181 0.27102872 0.11476375 0.12941386 0.90455886 0.32246292 > [7] 0.43858581 0.16533854 0.37415984 0.19100227 0.17023090 0.15125134 > [13] 0.17855019 0.06023773 0.52137996 0.85455350 > > But when I use the lm.influence() function > lm.influence(mt)$hat > I obtain: > [1] 0.1735989 0.2999146 0.2334095 0.1455117 0.9216644 0.7553856 > 0.4486403 > [8] 0.2755802 0.4188349 0.1914242 0.1790093 0.1573939 0.1787553 > 0.1975511 > [15] 0.5664988 0.8568274 > mt is a model of the type y~x1+x2+x3+x4+x5, where y is: > y > [1] 17 17 35 69 69 173 173 17 17 73 17 35 69 35 35 52 > > As you see the differences are no too small. > Where is the problem? Is only a numerical stability problem? > > Thank you very much for your helpThe intercept? What if you use X2<-cbind(1,X) or y~x1+x2+x3+x4+x5-1 -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Kenneth Cabrera <krcabrer at perseus.unalmed.edu.co> writes:> What is the difference between in the computation of the diag of the > "hat" matrix in: > "lm.influence" and the matrix operations with "solve()" and "t()"?> I mean, this is my X matrix> x1 x2 x3 x4 x5 > [1,] 0.297 0.310 0.290 0.220 0.1560 > [2,] 0.360 0.390 0.369 0.297 0.2050 > [3,] 0.075 0.058 0.047 0.034 0.0230 > [4,] 0.114 0.100 0.081 0.058 0.0420 > [5,] 0.229 0.213 0.198 0.142 0.0102 > [6,] 0.315 0.304 0.267 0.202 0.1470 > [7,] 0.477 0.518 0.496 0.395 0.2850 > [8,] 0.072 0.063 0.047 0.036 0.0240 > [9,] 0.099 0.092 0.074 0.056 0.0038 > [10,] 0.420 0.452 0.425 0.332 0.2350 > [11,] 0.189 0.178 0.153 0.107 0.0760 > [12,] 0.369 0.391 0.364 0.286 0.2000 > [13,] 0.142 0.124 0.105 0.077 0.0560 > [14,] 0.094 0.087 0.072 0.049 0.0320 > [15,] 0.171 0.161 0.145 0.094 0.0680 > [16,] 0.378 0.420 0.380 0.281 0.2000If your model formula is y~x1+x2+x3+x4+x5 then the model matrix would have a column of 1's followed by what you wrote. Try defining X as X <- model.matrix(y~x1+x2+x3+x4+x5, data = <whatever>) and re-doing your calculation. In general, please note that> diag(X%*%solve(t(X)%*%X)%*%t(X))is a *very* bad way of calculating the influences. Whenever you find yourself writing something like solve(t(X)%*%X) you're doing the numerical linear algebra badly. The rule of thumb in numerical linear algebra is that if the best way you can think of doing a calculation involves taking the inverse of a matrix then you need to think more about the calculation. The way that lm.influence calculates the influences is through an orthogonal-triangular decomposition which will be both faster and more accurate than what you have written. On today's machines "faster" doesn't matter that much for small or moderate sized data sets. On very large data sets faster still does matter. "Accurate" should always be a concern.> I obtain: > [1] 0.15248181 0.27102872 0.11476375 0.12941386 0.90455886 0.32246292 > [7] 0.43858581 0.16533854 0.37415984 0.19100227 0.17023090 0.15125134 > [13] 0.17855019 0.06023773 0.52137996 0.85455350 > > But when I use the lm.influence() function > lm.influence(mt)$hat > I obtain: > [1] 0.1735989 0.2999146 0.2334095 0.1455117 0.9216644 0.7553856 > 0.4486403 > [8] 0.2755802 0.4188349 0.1914242 0.1790093 0.1573939 0.1787553 > 0.1975511 > [15] 0.5664988 0.8568274 > mt is a model of the type y~x1+x2+x3+x4+x5, where y is: > y > [1] 17 17 35 69 69 173 173 17 17 73 17 35 69 35 35 52 > > As you see the differences are no too small. > Where is the problem? Is only a numerical stability problem?-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._