Christine Sonvilla
2008-May-28 11:19 UTC
[R] confidence interval for the logit - predict.glm
Hello all, I've come across an online posting http://www.biostat.wustl.edu/archives/html/s-news/2001-10/msg00119.html that described how to get confidence intervals for predicted values from predict.glm. These instructions were meant for S-Plus. Yet, it generally seems to work with R too, but I am encountering some problems. I am explaining my procedure in the following and would be most grateful if anyone could reply to me on that. I have run glm models to predict fish species from a number of environmental predictors, using a stepwise approach (I am aware that there is some controversy going on about this). I have got a data set that is called "testfishes.txt", "predy.txt" contains the predictors for the testfishes only and "predallx.txt" contains all predictors for all planning units. I used the following commands: "stepmodfi" is the output of the stepwise approach predictionlogit <- predict(stepmodfi, predallx, type="link", se.fit=TRUE) upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit)) lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit)) predictionresponse <- predict(stepmodfi, predallx, type="response", se.fit=FALSE) fishoccur[,i] <- predictionresponse type="link" should be the equivalent to the S-Plus version of type="1", which was indicated in the online posting explaining this procedure. So I first tried to get predictions on the logit scale and there is already the point where I am a bit bewildered. Predictionlogit would only result in ONE value for every single planning unit contained in predallx.txt, when actually I would assume that there would be "planning unit times fishes" predicted values - at least this is what I get from predictionresponse. Predictionresponse generates as many predicted values as there are planning units times fishes (1680 planning units x 205 fish species, whihc are predicted over all these planning units), which I put into a constructed matrix named "fishoccur". In fact this is what I did in the very beginning, generating predicted values with the command predictionresponse <- predict(stepmodfi, predallx, type="response", se.fit=FALSE). Then I wanted to construct confidence intervals around these values. Therefore I first generated the logit predicted values, but that already is the point I am unsure about. The problem in fact is, that quite some of my upper values of the confidence interval result in NAs. The lower interval seems fine. Yet, I am not even sure if my approach is correct, regarding this single value that is being produced by the prediction on the logit scale. I hope that my descriptions were clear enough and would greatly appreciate if anyone could suggest how to tackle this, whether the approach itself is fine (which I believe indeed) and how to get proper lower and upper confidence values. Many thanks in advance! Christine -- Super-Aktion nur in der GMX Spieleflat: 10 Tage f?r 1 Euro. ?ber 180 Spiele downloaden: http://flat.games.gmx.de
Prof Brian Ripley
2008-May-28 14:54 UTC
[R] confidence interval for the logit - predict.glm
Possibly your calculation overflows: exp(upperlogit)/(1+exp(upperlogit)) could be replaced by 1/(1+exp(-upperlogit)), or even better by plogis(upperlogit). This could happen via the Hauck-Donner effect: the fitted probabilities are very near one and the standard errors are very large. As for your other points, please follow the posting guide and 'provide commented, minimal, self-contained, reproducible code'. On Wed, 28 May 2008, Christine Sonvilla wrote:> > Hello all, > > I've come across an online posting > > http://www.biostat.wustl.edu/archives/html/s-news/2001-10/msg00119.html > > that described how to get confidence intervals for predicted values from > predict.glm. These instructions were meant for S-Plus. Yet, it generally > seems to work with R too, but I am encountering some problems. I am > explaining my procedure in the following and would be most grateful if > anyone could reply to me on that. > > I have run glm models to predict fish species from a number of > environmental predictors, using a stepwise approach (I am aware that > there is some controversy going on about this). I have got a data set > that is called "testfishes.txt", "predy.txt" contains the predictors for > the testfishes only and "predallx.txt" contains all predictors for all > planning units. > > I used the following commands: "stepmodfi" is the output of the stepwise approach > > predictionlogit <- predict(stepmodfi, predallx, type="link", se.fit=TRUE) > > upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit > lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit > > upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit)) > lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit))What is 'i' here?> predictionresponse <- predict(stepmodfi, predallx, type="response", se.fit=FALSE) > fishoccur[,i] <- predictionresponse > > > type="link" should be the equivalent to the S-Plus version of type="1", > which was indicated in the online posting explaining this procedure. > > So I first tried to get predictions on the logit scale and there is > already the point where I am a bit bewildered. Predictionlogit would > only result in ONE value for every single planning unit contained in > predallx.txt, when actually I would assume that there would be "planning > unit times fishes" predicted values - at least this is what I get from > predictionresponse. Predictionresponse generates as many predicted > values as there are planning units times fishes (1680 planning units x > 205 fish species, whihc are predicted over all these planning units), > which I put into a constructed matrix named "fishoccur". In fact this is > what I did in the very beginning, generating predicted values with the > command predictionresponse <- predict(stepmodfi, predallx, > type="response", se.fit=FALSE). Then I wanted to construct confidence > intervals around these values. Therefore I first generated the logit > predicted values, but that already is the point I am unsure about. > > The problem in fact is, that quite some of my upper values of the > confidence interval result in NAs. The lower interval seems fine. Yet, I > am not even sure if my approach is correct, regarding this single value > that is being produced by the prediction on the logit scale. > > I hope that my descriptions were clear enough and would greatly > appreciate if anyone could suggest how to tackle this, whether the > approach itself is fine (which I believe indeed) and how to get proper > lower and upper confidence values. > > Many thanks in advance! > > Christine > > -- > Super-Aktion nur in der GMX Spieleflat: 10 Tage f?r 1 Euro. > ?ber 180 Spiele downloaden: http://flat.games.gmx.de > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595