Hello: I am trying to understand the method 'hatvalues(...)', which returns something similar to the diagonals of the plain vanilla hat matrix [X(X'X)^(-1)X'], but not quite. A Fortran programmer I am not, but tracing through the code it looks like perhaps some sort of correction based on the notion of 'leave-one-out' variance is being applied. Whatever the difference, in simulations 'hatvalues' appears to perform much better in the context of identifying outiers using Cook's Distance than the diagonals of the plain vanilla hat matrix. (As in http://en.wikipedia.org/wiki/Cook's_distance). Would prefer to understand a little more when using this method. I have downloaded the freely available references cited in the help and am in the process of digesting them. If someone with knowledge could offer a pointer on the most efficient way to get at why 'hatvalues' does what it does, that would be great. Thanks, Jean Yarrington Independent consultant. [[alternative HTML version deleted]]
Sigmund Freud wrote:> Hello: > > I am trying to understand the method 'hatvalues(...)', which returns something similar to the diagonals of the plain vanilla hat matrix [X(X'X)^(-1)X'], but not quite. > > A Fortran programmer I am not, but tracing through the code it looks like perhaps some sort of correction based on the notion of 'leave-one-out' variance is being applied. >I can't see what the problem is. Using the LifeCycleSavings example from ?influence.measures: lm.SR <- lm(sr ~ pop15 + pop75 + dpi + ddpi, data = LifeCycleSavings) X <- model.matrix(lm.SR) H <- X %*% solve(t(X) %*% X) %*% t(X) hats1 <- diag(H) hats2 <- hatvalues(lm.SR) all.equal(hats1, hats2) #[1] TRUE> Whatever the difference, in simulations 'hatvalues' appears to perform much better in the context of identifying outiers using Cook's Distance than the diagonals of the plain vanilla hat matrix. (As in http://en.wikipedia.org/wiki/Cook's_distance). > > Would prefer to understand a little more when using this method. I have downloaded the freely available references cited in the help and am in the process of digesting them. If someone with knowledge could offer a pointer on the most efficient way to get at why 'hatvalues' does what it does, that would be great. >In a nutshell, hatvalues are a measure of how unusual a point is in predictor space, i.e. to what extent it "sticks out" in one or more of the X-dimensions. -Peter Ehlers> Thanks, > Jean Yarrington > Independent consultant. > > > > > [[alternative HTML version deleted]] > > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Viechtbauer Wolfgang (STAT)
2009-Nov-08 13:01 UTC
[R] influence.measures(stats): hatvalues(model, ...)
Not sure what you mean. yi <- c(2,3,2,4,3,6) xi <- c(1,4,3,2,4,5) res <- lm(yi ~ xi) hatvalues(res) X <- cbind(1, xi) diag( X%*%solve(t(X)%*%X)%*%t(X) ) Same result. Best, -- Wolfgang Viechtbauer http://www.wvbauer.com/ Department of Methodology and Statistics Tel: +31 (0)43 388-2277 School for Public Health and Primary Care Office Location: Maastricht University, P.O. Box 616 Room B2.01 (second floor) 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck) ________________________________________ From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Sigmund Freud [ss_freud_56 at yahoo.com] Sent: Sunday, November 08, 2009 8:14 AM To: r-help at r-project.org Subject: [R] influence.measures(stats): hatvalues(model, ...) Hello: I am trying to understand the method 'hatvalues(...)', which returns something similar to the diagonals of the plain vanilla hat matrix [X(X'X)^(-1)X'], but not quite. A Fortran programmer I am not, but tracing through the code it looks like perhaps some sort of correction based on the notion of 'leave-one-out' variance is being applied. Whatever the difference, in simulations 'hatvalues' appears to perform much better in the context of identifying outiers using Cook's Distance than the diagonals of the plain vanilla hat matrix. (As in http://en.wikipedia.org/wiki/Cook's_distance). Would prefer to understand a little more when using this method. I have downloaded the freely available references cited in the help and am in the process of digesting them. If someone with knowledge could offer a pointer on the most efficient way to get at why 'hatvalues' does what it does, that would be great. Thanks, Jean Yarrington Independent consultant.