Wayne Jones
2003-Oct-22 08:22 UTC
[R]: Prediction interval for a Gaussian family log-link model
Hi there fellow R-users, Can anyone tell me how to build a prediction interval for a gaussian log-link model for the reponse variable?? I can find the standard error of the predictions but I cant seem to find the prediction interval. Is there a way I can calculate the prediction interval from the standard errors?? Here's the example: logX<-rnorm(100) logY<--2-0.5*logX+rnorm(100,0,0.4) Y<-exp(logY) my.glm.mod<-glm(Y~logX,family=gaussian(link="log")) predict(my.glm.mod,type="response",se.fit=TRUE) Thanks, Wayne Dr Wayne R. Jones Senior Statistician / Research Analyst KSS Group plc St James's Buildings 79 Oxford Street Manchester M1 6SS Tel: +44(0)161 609 4084 Mob: +44(0)7810 523 713 KSS Ltd Seventh Floor St James's Buildings 79 Oxford Street Manchester M1 6SS England Company Registration Number 2800886 Tel: +44 (0) 161 228 0040 Fax: +44 (0) 161 236 6305 mailto:kssg@kssg.com http://www.kssg.com The information in this Internet email is confidential and m...{{dropped}}
Peter Dalgaard
2003-Oct-22 10:07 UTC
[R]: Prediction interval for a Gaussian family log-link model
Wayne Jones <JonesW at kssg.com> writes:> Hi there fellow R-users, > > Can anyone tell me how to build a prediction interval for a gaussian > log-link model for the reponse variable?? > I can find the standard error of the predictions but I cant seem to find the > prediction interval. Is there a way I can calculate the > prediction interval from the standard errors?? > > Here's the example: > > logX<-rnorm(100) > logY<--2-0.5*logX+rnorm(100,0,0.4) > Y<-exp(logY) > my.glm.mod<-glm(Y~logX,family=gaussian(link="log")) > predict(my.glm.mod,type="response",se.fit=TRUE)Er, that's not the right model for those data! Y <- exp(-2-0.5*logX)+rnorm(100,0,0.4) would be a log-link gaussian model (but a smaller SD would be more interesting). Assuming that this is what you intended, logX<-sort(rnorm(100)) Y <- exp(-2-0.5*logX)+rnorm(100,0,0.04) my.glm.mod<-glm(Y~logX,family=gaussian(link="log"), etastart=pmax(Y,0)) pp <- predict(my.glm.mod,type="response",se.fit=TRUE) bounds <- pp$fit + outer(sqrt(pp$residual.scale^2+pp$se.fit^2), qt(c(.05,.95),100-2)) plot(logX,Y) matlines(logX,bounds) BTW, we seem to have a problem with the gaussian(log) starting values if the observations are negative... Notice that this *only* works for gaussian responses and relies on the approximate normality of the predicted values. In general you have to compute the convolution of the error distribution with the distribution of the fitted value (possibly approximate). With discrete response variables, all bets are off: It is not at all clear what a prediction interval for binary response means. If you really intended a lognormal distribution for Y, and a loglinear relation with logX, just fit a linear model for log(Y) and transform everything back. -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907