On Jun 30, 2009, at 4:54 AM, Renzi Fabrizio wrote:
> I would like to know how R computes the probability of an event in a
> logistic model (P(y=1)) from the score s, linear combination of x and
> beta.
>
> I noticed that there are differences (small, less than e-16) between
> the
> fitting values automatically computed in the glm procedure by R, and
> the
> values "manually" computed by me applying the reverse formula
> p=e^s/(1+e^s); moreover I noticed that the minimum value of the
> fitting
> values in my estimation is 2.220446e-16, and there are many
> observation
> with this probability (instead the minimum value obtained by
> "manually"
> estimation is 2.872636e-152).
It would be helpful to see at least a subset of the output from your
model and your manual computations so that we can at least visually
compare the results to see where the differences may be.
The model object returned from using glm() will contain both the
linear predictors on the link scale (model$linear.predictors) and the
fitted values (model$fitted.values). The latter can be accessed using
the fitted() extractor function.
To use an example, let's create a simple LR model using the infert
data set as referenced in ?glm.
model1 <- glm(case ~ spontaneous + induced, data = infert, family =
binomial())
> model1
Call: glm(formula = case ~ spontaneous + induced, family =
binomial(), data = infert)
Coefficients:
(Intercept) spontaneous induced
-1.7079 1.1972 0.4181
Degrees of Freedom: 247 Total (i.e. Null); 245 Residual
Null Deviance: 316.2
Residual Deviance: 279.6 AIC: 285.6
# Get the coefficients
> coef(model1)
(Intercept) spontaneous induced
-1.7078601 1.1972050 0.4181294
# get the linear predictor values
# log odds scale for binomial glm
> head(model1$linear.predictors)
1 2 3 4 5 6
1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564 0.32560375
You can also get the above by using the coefficients and the model
matrix for comparison:
# the full set of 248
# coef(model1) %*% t(model.matrix(model1))
> head(as.vector(coef(model1) %*% t(model.matrix(model1))))
[1] 1.10467939 -1.28973068 -0.87160128 -0.87160128 -0.09252564
0.32560375
# get fitted response values (predicted probs)
> head(fitted(model1))
1 2 3 4 5 6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893
We can also get the fitted values from the linear predictor values by
using:
LP <- model1$linear.predictors
> head(exp(LP) / (1 + exp(LP)))
1 2 3 4 5 6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893
You can also get the above by using the predict.glm() function with
type == "response". The default type of "link" will get you
the linear
predictor values as above. predict.glm() would typically be used to
generate predictions using the model on new data.
> head(predict(model1, type = "response"))
1 2 3 4 5 6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893
In glm.fit(), which is the workhorse function in glm(), the fitted
values returned in the model object are actually computed by using the
inverse link function for the family passed to glm():
> binomial()$linkinv
function (eta)
.Call("logit_linkinv", eta, PACKAGE = "stats")
<environment: 0x171c8b10>
Thus:
> head(binomial()$linkinv(model1$linear.predictors))
1 2 3 4 5 6
0.7511359 0.2158984 0.2949212 0.2949212 0.4768851 0.5806893
So those are the same values as we saw above using the other methods.
So, all is consistent across the various methods.
Perhaps the above provides some insights for you into how R does some
of these things and also to point out, as is frequently the case with
R, there is more than one way to get the same result.
HTH,
Marc Schwartz