Malcolm Fairbrother
2012-May-03 16:37 UTC
[R] overlapping confidence bands for predicted probabilities from a logistic model
Dear list, I'm a bit perplexed why the 95% confidence bands for the predicted probabilities for units where x=0 and x=1 overlap in the following instance. I've simulated binary data to which I've then fitted a simple logistic regression model, with one covariate, and the coefficient on x is statistically significant at the 0.05 level. I've then used two different methods to generate 95% confidence bands for predicted probabilities, for each of two possible values of x. Given the result of the model, I expected the bands not to overlap? but they do. Can anyone please explain where I've gone wrong with the following code, OR why it makes sense for the bands to overlap, despite the statistically significant beta coefficient? There may be a good statistical reason for this, but I'm not aware of it. Many thanks, Malcolm Fairbrother n <- 120 set.seed(030512) x <- rbinom(n, 1, 0.5) dat <- within(data.frame(x), ybe <- rbinom(n, 1, plogis(-0.5 + x))) mod1 <- glm(ybe ~ x, dat, family=binomial) summary(mod1) # coefficient on x is statistically significant at the 0.05 level? almost at the 0.01 level pred <- predict(mod1, newdata=data.frame(x=c(0,1)), se.fit=T) with(pred, cbind(low = plogis(fit - 1.96*se.fit), est = plogis(fit), up = plogis(fit + 1.96*se.fit))) # confidence bands based on SEs # simulation-based confidence bands: sims <- t(replicate(200, coef(glm(simulate(mod1)$sim_1 ~ x, data=dat, family=binomial)))) pred0 <- plogis(quantile(sims%*%c(1,0), c(0.025, 0.5, 0.975))) pred1 <- plogis(quantile(sims%*%c(1,1), c(0.025, 0.5, 0.975))) rbind(pred0, pred1) # the upper bound of the prediction for x=0 is greater than the lower bound of the prediction for x=1, using both methods
Bert Gunter
2012-May-03 18:10 UTC
[R] overlapping confidence bands for predicted probabilities from a logistic model
Your post suggests statistical confusion on several levels. But as this concerns statistics, not R, consult your local statistician or post on a statistical help list (e.g. stats.stackexchange.com). Incidentally, the short answer to your question about the overlap is: why shouldn't they? You will hopefully receive a more informative longer answer from the above suggested (or similar) resources. -- Bert On Thu, May 3, 2012 at 9:37 AM, Malcolm Fairbrother <m.fairbrother at bristol.ac.uk> wrote:> Dear list, > > I'm a bit perplexed why the 95% confidence bands for the predicted probabilities for units where x=0 and x=1 overlap in the following instance. > > I've simulated binary data to which I've then fitted a simple logistic regression model, with one covariate, and the coefficient on x is statistically significant at the 0.05 level. I've then used two different methods to generate 95% confidence bands for predicted probabilities, for each of two possible values of x. Given the result of the model, I expected the bands not to overlap? but they do. > > Can anyone please explain where I've gone wrong with the following code, OR why it makes sense for the bands to overlap, despite the statistically significant beta coefficient? There may be a good statistical reason for this, but I'm not aware of it. > > Many thanks, > Malcolm Fairbrother > > > n <- 120 > set.seed(030512) > x <- rbinom(n, 1, 0.5) > dat <- within(data.frame(x), ybe <- rbinom(n, 1, plogis(-0.5 + x))) > mod1 <- glm(ybe ~ x, dat, family=binomial) > summary(mod1) # coefficient on x is statistically significant at the 0.05 level? almost at the 0.01 level > > pred <- predict(mod1, newdata=data.frame(x=c(0,1)), se.fit=T) > with(pred, cbind(low = plogis(fit - 1.96*se.fit), est = plogis(fit), up = plogis(fit + 1.96*se.fit))) ?# confidence bands based on SEs > > # simulation-based confidence bands: > sims <- t(replicate(200, coef(glm(simulate(mod1)$sim_1 ~ x, data=dat, family=binomial)))) > pred0 <- plogis(quantile(sims%*%c(1,0), c(0.025, 0.5, 0.975))) > pred1 <- plogis(quantile(sims%*%c(1,1), c(0.025, 0.5, 0.975))) > rbind(pred0, pred1) > > # the upper bound of the prediction for x=0 is greater than the lower bound of the prediction for x=1, using both methods > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm