Michael Friendly
2013-Sep-18 20:06 UTC
[R] ggplot: stat_smooth(method='glm', ...) - plot linear predictor?
The code below uses ggplot with stat_smooth(method="glm", family=binomial, ...) to plot the data on survival of passengers on the Titanic, with the logistic regression curves for each sex on the scale of Pr(survived). This works (quite nicely!) because I've explicitly transformed the factor survived to 0/1 in the ggplot call. Some questions: - Is it possible, and if so, how, to plot the same data and fitted smooths on the logit scale, i.e., the linear predictor for the binomial glm? - the response, survived, is a factor. Is it possible to avoid using as.numeric(survived)-1 in the call to ggplot()? This is cosmetic, but requires an extra bit of explanation to use in teaching or writing. i.e., glm() is quite happy to fit the model survived ~ age+sex in the binomial family, and gives the same predicted probabilities and logits. install.packages("vcdExtra")# data from the most recent version, vcdExtra_0.5-11 data(Titanicp, package="vcdExtra") str(Titanicp) 'data.frame': 1309 obs. of 6 variables: $ pclass : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ... $ survived: Factor w/ 2 levels "died","survived": 2 2 1 1 1 2 2 1 2 1 ... $ sex : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ... $ age : num 29 0.917 2 30 25 ... $ sibsp : num 0 1 1 1 1 0 1 0 2 0 ... $ parch : num 0 2 2 2 2 0 0 0 0 0 ... > require(ggplot2) # remove missings on age Titanicp <- Titanicp[!is.na(Titanicp$age),] ggplot(Titanicp, aes(age, as.numeric(survived)-1, color=sex)) + stat_smooth(method="glm", family=binomial, formula=y~x, alpha=0.2, size=2, aes(fill=sex)) + geom_point(position=position_jitter(height=0.02, width=0), size=1.5) # equivalent logistic regression model, survived as a factor mod <- glm(survived ~ age+sex, family=binomial, data=Titanicp) summary(mod) -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
Michael Friendly
2013-Sep-19 13:19 UTC
[R] ggplot: stat_smooth(method='glm', ...) - plot linear predictor?
Thanks for the very helpful reply. Some comments inline. On 9/18/2013 8:53 PM, Dennis Murphy wrote:> Hi Michael: >> Some questions: > > - Is it possible, and if so, how, to plot the same data and fitted smooths > on the logit > scale, i.e., the linear predictor for the binomial glm? > Yes, but not through stat/geom_smooth directly - the geom only > provides a simple default mechanism. You can create a data frame using > predict.glm() with the default type, set se.fit = TRUE and then use > geom_line() and geom_ribbon() in ggplot2. See below.Hmm. I thought that ggplot had the facility to apply a transformation, and found coord_trans, which I think does what I want, more or less (except that geom_point doesn't work). logit <- function(x) log(x)/log(1-x) ggplot(Titanicp, aes(age, as.numeric(survived)-1, color=sex)) + stat_smooth(method="glm", family=binomial, formula=y~x, alpha=0.2, size=2, aes(fill=sex)) + # geom_point(position=position_jitter(height=0.02, width=0), size=1.5) + coord_trans(y="logit") + labs(x = "Age", y = "Estimated logit")> . We first need to get the predicted logits with corresponding SE > estimates into a data frame along with age and sex, and we should be > ready to go:... With this setup, I'd be happy to plot the observations on the logit scale jittered around some reasonable values, say \pm 1.5. However my attempt to do this doesn't work and I'm not sure why. mod <- glm(survived ~ age*sex, family=binomial, data=Titanicp) modp <- cbind(Titanicp[, c("survived", "sex", "age")], predict(mod, se.fit = TRUE)) modp$obs <- c(-1.5, 1.5)[modp$survived] # Plot predicted logits with corresponding Wald CIs p <- ggplot(modp, aes(x = age, y = fit, color = sex)) + geom_line(size = 2) + geom_ribbon(aes(ymin = fit - 1.96 * se.fit, ymax = fit + 1.96 * se.fit, fill = sex), alpha = 0.2, color = "transparent") + labs(x = "Age", y = "Estimated logit") p + geom_point(y=modp$obs, position=position_jitter(height=0.02, width=0))>p + geom_point(y=modp$obs, position=position_jitter(height=0.02, width=0))Error: position_jitter requires the following missing aesthetics: y>p + geom_point(y=modp$obs, position=position_jitter(y=modp$obs, height=0.02, width=0))Error in position_jitter(y = modp$obs, height = 0.02, width = 0) : unused argument (y = modp$obs)> > Dennis > >> i.e., glm() is quite happy to fit the model survived ~ age+sex >> in the binomial family, and gives the same predicted probabilities and >> logits. >> >> install.packages("vcdExtra")# data from the most recent version, >> vcdExtra_0.5-11 >> data(Titanicp, package="vcdExtra") >> str(Titanicp) >> >> 'data.frame': 1309 obs. of 6 variables: >> $ pclass : Factor w/ 3 levels "1st","2nd","3rd": 1 1 1 1 1 1 1 1 1 1 ... >> $ survived: Factor w/ 2 levels "died","survived": 2 2 1 1 1 2 2 1 2 1 ... >> $ sex : Factor w/ 2 levels "female","male": 1 2 1 2 1 2 1 2 1 2 ... >> $ age : num 29 0.917 2 30 25 ... >> $ sibsp : num 0 1 1 1 1 0 1 0 2 0 ... >> $ parch : num 0 2 2 2 2 0 0 0 0 0 ... >> >> require(ggplot2) >> # remove missings on age >> Titanicp <- Titanicp[!is.na(Titanicp$age),] >> >> ggplot(Titanicp, aes(age, as.numeric(survived)-1, color=sex)) + >> stat_smooth(method="glm", family=binomial, formula=y~x, alpha=0.2, >> size=2, aes(fill=sex)) + >> geom_point(position=position_jitter(height=0.02, width=0), size=1.5) >> >> # equivalent logistic regression model, survived as a factor >> mod <- glm(survived ~ age+sex, family=binomial, data=Titanicp) >> summary(mod) >>-- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. & Chair, Quantitative Methods York University Voice: 416 736-2100 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA [[alternative HTML version deleted]]