Hi! I was trying to implement a probit model on a dichotomous outcome variable and found that the predictions were outside the (0,1) interval that one should get. I later tried it with some simulated data with a similar result. Here is a toy program I wrote and I cant figure why I should be getting such odd predictions. x1<-rnorm(1000) x2<-rnorm(1000) x3<-rnorm(1000) x4<-rnorm(1000) x5<-rnorm(1000) x6<-rnorm(1000) e1<-rnorm(1000)/3 e2<-rnorm(1000)/3 e3<-rnorm(1000)/3 y<-1-(1-pnorm(-2+0.33*x1+0.66*x2+1*x3+e1)*1-(pnorm(1+1.5*x4-0.25*x5+e2)*pnorm(1+0.2*x6+e3))) y <- y>runif(1000) dat<-data.frame(y = y, x1 = x1, x2 = x2, x3 = x3) g<-glm(y~., data = dat, family = binomial) summary(g) yhat<-predict(g, dat) Call: glm(formula = y ~ ., family = binomial, data = dat) Deviance Residuals: Min 1Q Median 3Q Max -1.8383 -1.3519 0.7638 0.9249 1.3698 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.71749 0.06901 10.397 < 2e-16 *** x1 0.10211 0.07057 1.447 0.14791 x2 0.21068 0.07177 2.936 0.00333 ** x3 0.35162 0.07070 4.974 6.57e-07 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1275.3 on 999 degrees of freedom Residual deviance: 1239.4 on 996 degrees of freedom AIC: 1247.4 Number of Fisher Scoring iterations: 4> yhat<-predict(g, dat) > > range(yhat)[1] -0.4416826 2.0056527> range(y)[1] 0 1 Any advice would be really helpful. thanks Arnab
>>>>> "AM" == Arnab mukherji <arnab at myrealbox.com> writes:AM> Any advice would be really helpful. Read the documentation of predict and then the one of predict.glm? ;-) I guess you actually wanted to do one of the following:> yhat<-predict(g, dat, type="response") > range(yhat)[1] 0.2760238 0.9229622 or,> yhat<-fitted(g) > range(yhat)[1] 0.2760238 0.9229622 Cheers, Berwin
On Fri, 5 Mar 2004, Arnab mukherji wrote:> I was trying to implement a probit model on a dichotomous outcome > variable and found that the predictions were outside the (0,1) interval > that one should get. I later tried it with some simulated data with a > similar result. > > Here is a toy program I wrote and I cant figure why I should be getting > such odd predictions.Did it occur to you to read the help page? The default type of prediction is "link", not "response". Do try ?predict.glm. ...> yhat<-predict(g, dat)... -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Arnab mukherji wrote:> Hi! > > I was trying to implement a probit model on a dichotomous outcome variable and found that the predictions were outside the (0,1) interval that one should get. I later tried it with some simulated data with a similar result. > > Here is a toy program I wrote and I cant figure why I should be getting such odd predictions. >Let me be the first to write "read the help file". There are several scales th at you can predict on in a GLM. The relevant part of the help file is this: type: the type of prediction required. The default is on the scale of the linear predictors; the alternative `"response"' is on the scale of the response variable. Thus for a default binomial model the default predictions are of log-odds (probabilities on logit scale) and `type = "response"' gives the predicted probabilities. Your predictions are on the probit scale. Bob -- Bob O'Hara Dept. of Mathematics and Statistics P.O. Box 4 (Yliopistonkatu 5) FIN-00014 University of Helsinki Finland Telephone: +358-9-191 23743 Mobile: +358 50 599 0540 Fax: +358-9-191 22 779 WWW: http://www.RNI.Helsinki.FI/~boh/ Journal of Negative Results - EEB: http://www.jnr-eeb.org
Dear Arnab, Several people have already noted that you're getting predicted values on the wrong scale. Note, as well, that you fit a logit model rather than a probit model; for a probit model, you need family=binomial(probit), since the logit link is the canonical link for the binomial family. John> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Arnab mukherji > Sent: Friday, March 05, 2004 2:48 AM > To: r-help at stat.math.ethz.ch > Cc: r-help at stat.math.ethz.ch > Subject: [R] Probit predictions outside (0,1) interval > > Hi! > > I was trying to implement a probit model on a dichotomous > outcome variable and found that the predictions were outside > the (0,1) interval that one should get. I later tried it with > some simulated data with a similar result. > > Here is a toy program I wrote and I cant figure why I should > be getting such odd predictions. > > x1<-rnorm(1000) > x2<-rnorm(1000) > x3<-rnorm(1000) > x4<-rnorm(1000) > x5<-rnorm(1000) > x6<-rnorm(1000) > e1<-rnorm(1000)/3 > e2<-rnorm(1000)/3 > e3<-rnorm(1000)/3 > y<-1-(1-pnorm(-2+0.33*x1+0.66*x2+1*x3+e1)*1-(pnorm(1+1.5*x4-0. > 25*x5+e2)*pnorm(1+0.2*x6+e3))) > y <- y>runif(1000) > dat<-data.frame(y = y, x1 = x1, x2 = x2, x3 = x3) g<-glm(y~., > data = dat, family = binomial) > summary(g) > yhat<-predict(g, dat) > > > Call: > glm(formula = y ~ ., family = binomial, data = dat) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.8383 -1.3519 0.7638 0.9249 1.3698 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 0.71749 0.06901 10.397 < 2e-16 *** > x1 0.10211 0.07057 1.447 0.14791 > x2 0.21068 0.07177 2.936 0.00333 ** > x3 0.35162 0.07070 4.974 6.57e-07 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 1275.3 on 999 degrees of freedom > Residual deviance: 1239.4 on 996 degrees of freedom > AIC: 1247.4 > > Number of Fisher Scoring iterations: 4 > > > yhat<-predict(g, dat) > > > > range(yhat) > [1] -0.4416826 2.0056527 > > range(y) > [1] 0 1 > > Any advice would be really helpful. >