Pankaj Choudhary
2003-Jan-21 03:29 UTC
[R] Logistic regression: At times correlation matrix of coefficients gets messed up
Hi, When I include a categorical variable (RACE with 3 levels - "white", "black" and "other") in my logistic regression model, the correlation matrix of the coefficients gets messed up. I get something like: ----------------------------------------- Correlation of Coefficients: ( A L RACEb AGE , 1 LWT , 1 RACEblack 1 RACEother . . attr(,"legend") [1] 0 ` ' 0.3 `.' 0.6 `,' 0.8 `+' 0.9 `*' 0.95 `B' 1 ------------------------------------- I couldn't figure out how to interpret it. Here is the sequence of commands and the complete output. (I am using R 1.6.2) -----------------------------------------> lowbwt.alr <- glm(LOW~AGE+LWT+RACE, family=binomial, data=lowbwt) > summary(lowbwt.alr, correlation=TRUE)Call: glm(formula = LOW ~ AGE + LWT + RACE, family = binomial, data = lowbwt) Deviance Residuals: Min 1Q Median 3Q Max -1.4052 -0.8946 -0.7209 1.2484 2.0982 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.306741 1.069558 1.222 0.2218 AGE -0.025524 0.033244 -0.768 0.4426 LWT -0.014353 0.006521 -2.201 0.0277 * RACEblack 1.003821 0.497957 2.016 0.0438 * RACEother 0.443460 0.360184 1.231 0.2182 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 234.67 on 188 degrees of freedom Residual deviance: 222.66 on 184 degrees of freedom AIC: 232.66 Number of Fisher Scoring iterations: 3 Correlation of Coefficients: ( A L RACEb AGE , 1 LWT , 1 RACEblack 1 RACEother . . attr(,"legend") [1] 0 ` ' 0.3 `.' 0.6 `,' 0.8 `+' 0.9 `*' 0.95 `B' 1 --------------------------------------------------------------------- Strangely enough, when I just use (AGE and RACE) or (LWT and RACE) or (AGE and LWT) or just RACE as the explanatory variable(s), there is no problem. Am I doing something wrong? I will greatly appreciate any help. With best wishes, Pankaj Choudhary U. of Texas at Dallas
Prof Brian D Ripley
2003-Jan-21 09:29 UTC
[R] Logistic regression: At times correlation matrix of coefficients gets messed up
It's not messed up, just someone's idea of a compact display. Options are 1) Use vcov(fit) instead 2) Use print(summary(fit), symbolic.cor=FALSE) Does anyone think that the current arrangement (use this scheme for more than 4 coefficients) is sensible? Surely the abbreviations are not ("(" for intercept?), and why is the diagonal being shown but the top row and last column have been omitted? If the whole matrix was shown, the column labels could be omitted. I'd much prefer symbolic.cor=FALSE to be the default. On Mon, 20 Jan 2003, Pankaj Choudhary wrote:> > Hi, > > When I include a categorical variable (RACE with 3 levels - "white", > "black" and "other") in my logistic regression model, the correlation > matrix of the coefficients gets messed up. I get something like: > > ----------------------------------------- > Correlation of Coefficients: > ( A L RACEb > AGE , 1 > LWT , 1 > RACEblack 1 > RACEother . . > attr(,"legend") > [1] 0 ` ' 0.3 `.' 0.6 `,' 0.8 `+' 0.9 `*' 0.95 `B' 1 > ------------------------------------- > > I couldn't figure out how to interpret it. Here is the sequence of > commands and the complete output. (I am using R 1.6.2)-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595