Kiyoshi Sasaki
2010-Jul-09 02:46 UTC
[R] Appropriate tests for logistic regression with a continuous predictor variable and Bernoulli response variable
I have a data with binary response variable, repcnd (pregnant or not) and one predictor continuous variable, svl (body size) as shown below. I did Hosmer-Lemeshow test as a goodness of fit (as suggested by a kind “R-helper” previously). To test whether the predictor (svl, or body size) has significant effect on predicting whether or not a female snake is pregnant, I used the differences between null deviance and residual deviance using a code as following: 1-pchisq(mod.fit$null.deviance - mod.fit$deviance, mod.fit$df.null - mod.fit$df.residual) Could anyone tell me whether I did the test properly? I did this test because I thought Wald test/z score listed in the output from "summary(mod.fit)" is not appropriate for a kind of data I have. Does R have automated function to run appropriate tests? I have pasted my R output below. Thank you in advance for your time and help. Kiyoshi repcnd svl 1 1 51.5 2 1 52.5 <edited> 294 0 59.8 298 1 60.0 300 1 51.7 301 1 57.4 302 1 60.9 303 0 56.8 304 0 50.0 -------------------> mod.fit <- glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link = logit), data = gb.no.M, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) > summary(mod.fit)Call: glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link = logit), data = gb.no.M, na.action = na.exclude, control = list(epsilon = 1e-04, maxit = 50, trace = F)) Deviance Residuals: Min 1Q Median 3Q Max -1.757 -1.109 0.734 1.113 1.632 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.08565 1.84106 -3.849 0.000119 *** gb.no.M$svl 0.13529 0.03474 3.894 9.85e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 301.92 on 217 degrees of freedom Residual deviance: 285.04 on 216 degrees of freedom (8 observations deleted due to missingness) AIC: 289.04 Number of Fisher Scoring iterations: 3 -------------------------------------------------------------------------------> Hosmer-Lemeshow test > > hosmerlem <- function (y, yhat, g = 10)+ { + cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), include.lowest = T) + obs <- xtabs(cbind(1 - y, y) ~ cutyhat) + expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat) + chisq <- sum((obs - expect)^2/expect) + P <- 1 - pchisq(chisq, g - 2) + c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P) + }> > mod.fit <- glm(formula = no.NA$repcnd ~ no.NA$svl, family = binomial(link = logit), data = no.NA, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F))> hosmerlem(no.NA$repcnd, fitted(mod.fit))X^2 Df P(>Chi) 6.8742531 8.0000000 0.5502587 ---------------------------------------------------------------------------------------------------> list(p.value = round(1-pchisq(mod.fit$null.deviance - mod.fit$deviance,+ mod.fit$df.null- mod.fit$df.residual),6), + df = mod.fit$df.null- mod.fit$df.residual, + change = mod.fit$null.deviance - mod.fit$deviance) $p.value [1] 4e-05 $df [1] 1 $change [1] 16.87895 [[alternative HTML version deleted]]
Eik Vettorazzi
2010-Jul-09 12:58 UTC
[R] Appropriate tests for logistic regression with a continuous predictor variable and Bernoulli response variable
Have a look at ?anova or rather ?anova.glm hth Am 09.07.2010 04:46, schrieb Kiyoshi Sasaki:> I have a data with binary response variable, repcnd (pregnant or not) and one predictor continuous variable, svl (body size) as shown below. I did Hosmer-Lemeshow test as a goodness of fit (as suggested by a kind ?R-helper? previously). To test whether the predictor (svl, or body size) has significant effect on predicting whether or not a female snake is pregnant, I used the differences between null deviance and residual deviance using a code as following: > > > 1-pchisq(mod.fit$null.deviance - mod.fit$deviance, mod.fit$df.null - mod.fit$df.residual) > > Could anyone tell me whether I did the test properly? I did this test because I thought Wald test/z score listed in the output from "summary(mod.fit)" is not appropriate for a kind of data I have. Does R have automated function to run appropriate tests? I have pasted my R output below. > > Thank you in advance for your time and help. > > Kiyoshi > > > repcnd svl > 1 1 51.5 > 2 1 52.5 > <edited> > 294 0 59.8 > 298 1 60.0 > 300 1 51.7 > 301 1 57.4 > 302 1 60.9 > 303 0 56.8 > 304 0 50.0 > ------------------- > >> mod.fit <- glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link = logit), data = gb.no.M, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) >> summary(mod.fit) >> > > Call: > glm(formula = gb.no.M$repcnd ~ gb.no.M$svl, family = binomial(link = logit), > data = gb.no.M, na.action = na.exclude, control = list(epsilon = 1e-04, > maxit = 50, trace = F)) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.757 -1.109 0.734 1.113 1.632 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) -7.08565 1.84106 -3.849 0.000119 *** > gb.no.M$svl 0.13529 0.03474 3.894 9.85e-05 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 301.92 on 217 degrees of freedom > Residual deviance: 285.04 on 216 degrees of freedom > (8 observations deleted due to missingness) > AIC: 289.04 > > Number of Fisher Scoring iterations: 3 > ------------------------------------------------------------------------------- > >> Hosmer-Lemeshow test >> >> hosmerlem <- function (y, yhat, g = 10) >> > + { > + cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)), include.lowest = T) > + obs <- xtabs(cbind(1 - y, y) ~ cutyhat) > + expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat) > + chisq <- sum((obs - expect)^2/expect) > + P <- 1 - pchisq(chisq, g - 2) > + c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P) > + } > >> mod.fit <- glm(formula = no.NA$repcnd ~ no.NA$svl, family = binomial(link = logit), data = no.NA, na.action = na.exclude, control = list(epsilon = 0.0001, maxit = 50, trace = F)) >> > > >> hosmerlem(no.NA$repcnd, fitted(mod.fit)) >> > X^2 Df P(>Chi) > 6.8742531 8.0000000 0.5502587 > --------------------------------------------------------------------------------------------------- > >> list(p.value = round(1-pchisq(mod.fit$null.deviance - mod.fit$deviance, >> > + mod.fit$df.null- mod.fit$df.residual),6), > + df = mod.fit$df.null- mod.fit$df.residual, > + change = mod.fit$null.deviance - mod.fit$deviance) > > $p.value > [1] 4e-05 > > $df > [1] 1 > > $change > [1] 16.87895 > > > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Eik Vettorazzi Institut f?r Medizinische Biometrie und Epidemiologie Universit?tsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790