Jef Vlegels
2010-Aug-30 15:17 UTC
[R] 'mgcv' package, problem with predicting binomial (logit) data
Dear R-help list, I?m using the mgcv package to plot predictions based on the gam function. I predict the chance of being a (frequent) participant at theater plays vs. not being a participant by age. Because my outcome variable is dichotomous, I use the binomial family with logit link function. Dataset in attachment, code to read it in R: data <- read.spss("pas_r.sav") attach(data) In a first step I use ?gam? to model my data and ?predict? to calculate and plot the predicted values, this all works fine. My code looks like this: test <- gam(participant ~ s(age,fx=FALSE,bs='cr'), family=binomial(logit)) summary(test) plot(test, shade=TRUE) gam.check(test) test.pred <- predict(test,newdata=data,se.fit=TRUE,type='response', na.action=na.omit) I1<-order(age) plot(age[I1], test.pred$fit[I1],lty=1, type="l") lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2) lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2) plot(age[I1], test.pred$fit[I1] , type="l") I a second step, I want to calculate a similar model, but only for respondents with a certain characteristic. For example, in this case, only for male respondents. I use a code that looks like this: participant_male <- participant[gender=="male"] age_male <- age[gender=="male"] test2<-gam(participant_male ~ s(age_male, fx=FALSE, bs="cr"), family=binomial(logit), na.action=na.omit) summary(test2) plot(test2, shade=TRUE) I get a nice smoother function in this plot, like I expected. Then, when I want to plot the predicted values, I use a code that looks like this: Test2.pred <- predict(test5,se.fit=TRUE, type="response") I1<-order(age_male) plot(age_male[I1], test2.pred$fit[I1],lty=1) This last plot, of the predictions, is not what I expect. It?s just a random scatterplot, not what I would expect from the smoother plot. Does anybody know what I did wrong? Thanks in advance, Jef Vlegels Jef Vlegels Ghent University - Department of Sociology Korte Meer 3, B-9000 Gent, Belgium Tel:? 09 264 8343 www.psw.UGent.be/sociologie
Jef Vlegels
2010-Aug-30 15:44 UTC
[R] 'mgcv' package, problem with predicting binomial (logit) data
There was a problem with the data in attachment, here is a better version. Use something like: data <- read.table("pas_r.txt",header=TRUE,sep=";") to read it. Thanks, Jef -----Original Message----- From: David Winsemius [mailto:dwinsemius at comcast.net] Sent: maandag 30 augustus 2010 17:22 To: Jef Vlegels Subject: FYI offlist Re: [R] 'mgcv' package, problem with predicting binomial (logit) data On Aug 30, 2010, at 11:17 AM, Jef Vlegels wrote:> Dear R-help list, > > I'm using the mgcv package to plot predictions based on the gam > function. > > I predict the chance of being a (frequent) participant at theater > plays vs. > not being a participant by age. > Because my outcome variable is dichotomous, I use the binomial > family with > logit link function. > > Dataset in attachment, code to read it in R:Nope. .sav files are not accepted by the mail-server. See the Posting Guide. (.txt and .pdf are the safest formats)> data <- read.spss("pas_r.sav") > attach(data) > > In a first step I use 'gam' to model my data and 'predict' to > calculate and > plot the predicted values, this all works fine. > My code looks like this: > > test <- gam(participant ~ s(age,fx=FALSE,bs='cr'), > family=binomial(logit)) > summary(test) > plot(test, shade=TRUE) > gam.check(test) > test.pred <- predict(test,newdata=data,se.fit=TRUE,type='response', > na.action=na.omit) > I1<-order(age) > plot(age[I1], test.pred$fit[I1],lty=1, type="l") > lines(age[I1],test.pred$fit[I1]+2*M1pred$se[I1],lty=2) > lines(age[I1],test.pred$fit[I1]-2*M1pred$se[I1],lty=2) > plot(age[I1], test.pred$fit[I1] , type="l") > > I a second step, I want to calculate a similar model, but only for > respondents with a certain characteristic. For example, in this > case, only > for male respondents. > I use a code that looks like this: > > participant_male <- participant[gender=="male"] > age_male <- age[gender=="male"] > > test2<-gam(participant_male ~ s(age_male, fx=FALSE, bs="cr"), > family=binomial(logit), na.action=na.omit) > summary(test2) > plot(test2, shade=TRUE) > > I get a nice smoother function in this plot, like I expected. > Then, when I want to plot the predicted values, I use a code that > looks like > this: > > Test2.pred <- predict(test5,se.fit=TRUE, type="response") > I1<-order(age_male) > plot(age_male[I1], test2.pred$fit[I1],lty=1) > > This last plot, of the predictions, is not what I expect. It's just > a random > scatterplot, not what I would expect from the smoother plot. Does > anybody > know what I did wrong? > > Thanks in advance, > Jef Vlegels > > Jef Vlegels > Ghent University - Department of Sociology > Korte Meer 3, B-9000 Gent, Belgium > Tel: 09 264 8343 > www.psw.UGent.be/sociologie > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pas_r.txt URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100830/fcd2b6e1/attachment.txt>