Hi, I am not a statistician and so I am sure whatever it is I am doing wrong must be an obvious error for those who are...Basically I can not understand why I get NA for variable 'CDSTotal' when running a glm? Does anyone have an idea of why this might be happening? Call: glm(formula = cbind(SRAS - 26, 182 - SRAS) ~ Age + Gender + LOC + PC + Stability + CDSTotal, family = binomial, data = Controlgroup) Coefficients: (Intercept) Age Gender LOC PC Stability -2.575071 0.009148 0.354143 0.018295 -0.011317 0.090759 CDSTotal NA Degrees of Freedom: 64 Total (i.e. Null); 59 Residual Null Deviance: 2015 Residual Deviance: 1264 AIC: 1614 Thanks Matthew -- View this message in context: http://r.789695.n4.nabble.com/glm-help-final-predictor-variable-NA-tp4710161.html Sent from the R help mailing list archive at Nabble.com.
Psigh! Why do people think that it is perfectly OK to undertake statistical analyses without knowing or understanding any statistics? (I guess it's slightly less dangerous than undertaking to do your own wiring without knowing anything about being an electrician, but still ....) However, to stop venting and answer your question: It is because "CDSTotal" is perfectly confounded (in the given design) with the other predictors. That is, CDSTotal is exactly equal to a linear combination of the other predictors (and the constant "1"). Try: lm(CDSTotal ~ Age + Gender + LOC + PC + Stability, data=Controlgroup) and you will find that the error sum of squares is zero (to within numerical tolerance). cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 On 22/07/15 06:56, matthewjones43 wrote:> Hi, I am not a statistician and so I am sure whatever it is I am doing wrong > must be an obvious error for those who are...Basically I can not understand > why I get NA for variable 'CDSTotal' when running a glm? Does anyone have an > idea of why this might be happening? > > Call: glm(formula = cbind(SRAS - 26, 182 - SRAS) ~ Age + Gender + LOC + > PC + Stability + CDSTotal, family = binomial, data = Controlgroup) > > Coefficients: > (Intercept) Age Gender LOC PC Stability > -2.575071 0.009148 0.354143 0.018295 -0.011317 0.090759 > CDSTotal > NA > > Degrees of Freedom: 64 Total (i.e. Null); 59 Residual > Null Deviance: 2015 > Residual Deviance: 1264 AIC: 1614
> On Jul 21, 2015, at 7:30 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote: > > > Psigh! Why do people think that it is perfectly OK to undertake statistical analyses without knowing or understanding any statistics? > (I guess it's slightly less dangerous than undertaking to do your own wiring without knowing anything about being an electrician, but still ?.)Fortune?
matthewjones43 <matthew.jones <at> kellogg.ox.ac.uk> writes:> > Hi, I am not a statistician and so I am sure whatever it is I > am doing wrong > must be an obvious error for those who are...Basically I can > not understand > why I get NA for variable 'CDSTotal' when running a glm? > Does anyone have an > idea of why this might be happening?It might be useful to look at http://stackoverflow.com/questions/7337761/ linear-regression-na-estimate-just-for-last-coefficient/7341074#7341074 (broken URL). You are overfitting the model by including a predictor that can be expressed as a linear combination of other predictors, and R is trying to handle it automatically.