thr3ads.net - R help - [R] confidence interval for the logit

If this information is useful, please help other people find it:
Share via:

Christine Sonvilla

2008-May-28 11:19 UTC

[R] confidence interval for the logit - predict.glm

Hello all,

I've come across an online posting 
 
http://www.biostat.wustl.edu/archives/html/s-news/2001-10/msg00119.html

that described how to get confidence intervals for predicted values from
predict.glm. These instructions were meant for S-Plus. Yet, it generally seems
to work with R too, but I am encountering some problems. I am explaining my
procedure in the following and would be most grateful if anyone could reply to
me on that.

I have run glm models to predict fish species from a number of environmental
predictors, using a stepwise approach (I am aware that there is some controversy
going on about this). I have got a data set that is called
"testfishes.txt", "predy.txt" contains the predictors for
the testfishes only and "predallx.txt" contains all predictors for all
planning units.

I used the following commands: "stepmodfi" is the output of the
stepwise approach

predictionlogit <- predict(stepmodfi, predallx, type="link",
se.fit=TRUE)

upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit
lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit

upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit))
lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit))

predictionresponse <- predict(stepmodfi, predallx, type="response",
se.fit=FALSE)
fishoccur[,i] <- predictionresponse


type="link" should be the equivalent to the S-Plus version of
type="1", which was indicated in the online posting explaining this
procedure.

So I first tried to get predictions on the logit scale and there is already the
point where I am a bit bewildered. Predictionlogit would only result in ONE
value for every single planning unit contained in predallx.txt, when actually I
would assume that there would be "planning unit times fishes"
predicted values - at least this is what I get from predictionresponse.
Predictionresponse generates as many predicted values as there are planning
units times fishes (1680 planning units x 205 fish species, whihc are predicted
over all these planning units), which I put into a constructed matrix named
"fishoccur".
In fact this is what I did in the very beginning, generating predicted values
with the command predictionresponse <- predict(stepmodfi, predallx,
type="response", se.fit=FALSE).
Then I wanted to construct confidence intervals around these values.
Therefore I first generated the logit predicted values, but that already is the
point I am unsure about.

The problem in fact is, that quite some of my upper values of the confidence
interval result in NAs. The lower interval seems fine. Yet, I am not even sure
if my approach is correct, regarding this single value that is being produced by
the prediction on the logit scale.

I hope that my descriptions were clear enough and would greatly appreciate if
anyone could suggest how to tackle this, whether the approach itself is fine
(which I believe indeed) and how to get proper lower and upper confidence
values.

Many thanks in advance!

Christine

-- 
Super-Aktion nur in der GMX Spieleflat: 10 Tage f?r 1 Euro.
?ber 180 Spiele downloaden: http://flat.games.gmx.de

Prof Brian Ripley

2008-May-28 14:54 UTC

head link

[R] confidence interval for the logit - predict.glm

Possibly your calculation overflows: exp(upperlogit)/(1+exp(upperlogit)) 
could be replaced by 1/(1+exp(-upperlogit)), or even better by 
plogis(upperlogit).  This could happen via the Hauck-Donner effect: the 
fitted probabilities are very near one and the standard errors are very 
large.

As for your other points, please follow the posting guide and
'provide commented, minimal, self-contained, reproducible code'.

On Wed, 28 May 2008, Christine Sonvilla wrote:
>
> Hello all,
>
> I've come across an online posting
>
> http://www.biostat.wustl.edu/archives/html/s-news/2001-10/msg00119.html
>
> that described how to get confidence intervals for predicted values from 
> predict.glm. These instructions were meant for S-Plus. Yet, it generally 
> seems to work with R too, but I am encountering some problems. I am 
> explaining my procedure in the following and would be most grateful if 
> anyone could reply to me on that.
>
> I have run glm models to predict fish species from a number of 
> environmental predictors, using a stepwise approach (I am aware that 
> there is some controversy going on about this). I have got a data set 
> that is called "testfishes.txt", "predy.txt" contains
the predictors for
> the testfishes only and "predallx.txt" contains all predictors
for all
> planning units.
>
> I used the following commands: "stepmodfi" is the output of the
stepwise approach
>
> predictionlogit <- predict(stepmodfi, predallx, type="link",
se.fit=TRUE)
>
> upperlogit <- predictionlogit$fit + 2*predictionlogit$se.fit
> lowerlogit <- predictionlogit$fit - 2*predictionlogit$se.fit
>
> upperresponse[,i] <- exp(upperlogit)/(1+exp(upperlogit))
> lowerresponse[,i] <- exp(lowerlogit)/(1+exp(lowerlogit))
What is 'i' here?
> predictionresponse <- predict(stepmodfi, predallx,
type="response", se.fit=FALSE)
> fishoccur[,i] <- predictionresponse
>
>
> type="link" should be the equivalent to the S-Plus version of
type="1",
> which was indicated in the online posting explaining this procedure.
>
> So I first tried to get predictions on the logit scale and there is 
> already the point where I am a bit bewildered. Predictionlogit would 
> only result in ONE value for every single planning unit contained in 
> predallx.txt, when actually I would assume that there would be
"planning
> unit times fishes" predicted values - at least this is what I get from
> predictionresponse. Predictionresponse generates as many predicted 
> values as there are planning units times fishes (1680 planning units x 
> 205 fish species, whihc are predicted over all these planning units), 
> which I put into a constructed matrix named "fishoccur". In fact
this is
> what I did in the very beginning, generating predicted values with the 
> command predictionresponse <- predict(stepmodfi, predallx, 
> type="response", se.fit=FALSE). Then I wanted to construct
confidence
> intervals around these values. Therefore I first generated the logit 
> predicted values, but that already is the point I am unsure about.
>
> The problem in fact is, that quite some of my upper values of the 
> confidence interval result in NAs. The lower interval seems fine. Yet, I 
> am not even sure if my approach is correct, regarding this single value 
> that is being produced by the prediction on the logit scale.
>
> I hope that my descriptions were clear enough and would greatly 
> appreciate if anyone could suggest how to tackle this, whether the 
> approach itself is fine (which I believe indeed) and how to get proper 
> lower and upper confidence values.
>
> Many thanks in advance!
>
> Christine
>
> -- 
> Super-Aktion nur in der GMX Spieleflat: 10 Tage f?r 1 Euro.
> ?ber 180 Spiele downloaden: http://flat.games.gmx.de
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Maybe Matching Threads

cv.glm function error message in a loop

R help - May 2008 - confidence interval for the logit - predict.glm

[R] confidence interval for the logit - predict.glm

[R] confidence interval for the logit - predict.glm

Maybe Matching Threads