Katki, Hormuzd (NIH/NCI)
2003-Jun-12 17:24 UTC
[R] What PRECISELY is the dfbetas() or lm.influence()$coef ?
Hello. I want to get the proper influence function for the glm coefficients in R. This is supposed to be inv(information)*(y-yhat)*x. So I am wondering what is the exact mathematical formula for the output that the functions: dfbeta() OR lm.influence()$coefficients return for a glm model. I am confused because: 1. Their columns don't sum to zero as influences should. 2. They return different "influences", so the 2 functions are doing something different. 3. I think they divide each element by the standard error of the corresponding coefficient, but that's not enough to resolve any discrepancies The documentation doesn't provide any details. Any help would be greatly appreciated.> Thank you, > Hormuzd Katki > > Hormuzd Katki > Biostatistics Branch, Division of Cancer Epidemiology and Genetics > National Cancer Institute > 6120 Executive Blvd. Room 8044 MSC 7244 > Rockville, MD 20852-4910 > 301-594-7818 (voice) > 301-402-0081 (fax) > katkih at mail.nih.gov > >
John Fox
2003-Jun-12 20:58 UTC
[R] What PRECISELY is the dfbetas() or lm.influence()$coef ?
Dear Hormuzd, At 01:24 PM 6/12/2003 -0400, Katki, Hormuzd (NIH/NCI) wrote:> Hello. I want to get the proper influence function for the glm >coefficients in R. This is supposed to be inv(information)*(y-yhat)*x. So >I am wondering what is the exact mathematical formula for the output that >the functions: > >dfbeta() OR lm.influence()$coefficients > >return for a glm model. I am confused because: > >1. Their columns don't sum to zero as influences should.Even in a linear model, where the computation is exact, this isn't the case, if influence is defined as the change in the coefficients upon deleting each observation in turn (i.e., as dfbeta).>2. They return different "influences", so the 2 functions are doing >something different.That's odd. I believe that dfbeta() for a GLM simply uses influence.glm, which has the same $coefficients component as lm.influence. As such, for a GLM, both are based on the last step of the IRLS fit -- i.e., a linearization of the model.>3. I think they divide each element by the standard error of the >corresponding coefficient, but that's not enough to resolve any >discrepanciesPerhaps you meant that dfbetas() [not dfbeta()] returns different values from lm.influence()$coef (as in your subject line)? dfbetas standardizes the coefficient changes by coefficient standard errors, using a deleted estimate of the dispersion parameter.>The documentation doesn't provide any details. Any help would be greatly >appreciated.I hope that this helps, John ----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox