influence.measures gives several measures of influence for each observation (Cook's Distance, etc) and actually flags observations that it determines are influential by any of the measures. Looks good! But how does it discriminate between the influential and non- influential observations by each of the measures? Like does it do a Bonferroni-corrected t on the residuals identified by the influence measures or some other test? Cheers, Frank Tamborello, PhD W. M. Keck Postdoctoral Fellow School of Health Information Sciences University of Texas Health Science Center, Houston [[alternative HTML version deleted]]
Viechtbauer Wolfgang (STAT)
2010-Feb-22 13:00 UTC
[R] tests for measures of influence in regression
I don't think this information can be found in the documentation, but you can always just check the actual influence.measures() and print.infl() code to find out. Most importantly, influence.measures() incldues the following code: function (model) { is.influential <- function(infmat, n) { k <- ncol(infmat) - 4 if (n <= k) stop("too few cases, n < k") absmat <- abs(infmat) result <- cbind(absmat[, 1L:k] > 1, absmat[, k + 1] > 3 * sqrt(k/(n - k)), abs(1 - infmat[, k + 2]) > (3 * k)/(n - k), pf(infmat[, k + 3], k, n - k) > 0.5, infmat[, k + 4] > (3 * k)/n) dimnames(result) <- dimnames(infmat) result } ... infmat <- cbind(dfbetas, dffit = dffits, cov.r = cov.ratio, cook.d = cooks.d, hat = h) ... is.inf <- is.influential(infmat, sum(h > 0)) ... } So, a case is flagged if: - any of its absolute dfbetas values are larger than 1, or - its absolute dffits value is larger than 3*sqrt(k/(n-k)), or - abs(1 - covratio) is larger than 3*k/(n-k), or - its Cook's distance is larger than the 50% percentile of an F-distributio with k and n-k degrees of freedom, or - its hatvalue is larger than 3*k/n, where k denotes the number of model coefficients (e.g., k = 2 for simple regression with the intercept included in the model). Best, -- Wolfgang Viechtbauer http://www.wvbauer.com/ Department of Methodology and Statistics Tel: +31 (43) 388-2277 School for Public Health and Primary Care Office Location: Maastricht University, P.O. Box 616 Room B2.01 (second floor) 6200 MD Maastricht, The Netherlands Debyeplein 1 (Randwyck) ----Original Message---- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Frank Tamborello Sent: Monday, February 22, 2010 00:39 To: r-help at r-project.org Subject: [R] tests for measures of influence in regression> influence.measures gives several measures of influence for each > observation (Cook's Distance, etc) and actually flags observations > that it determines are influential by any of the measures. Looks > good! But how does it discriminate between the influential and non- > influential observations by each of the measures? Like does it do a > Bonferroni-corrected t on the residuals identified by the influence > measures or some other test? > > Cheers, > > Frank Tamborello, PhD > W. M. Keck Postdoctoral Fellow > School of Health Information Sciences > University of Texas Health Science Center, Houston > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.