Werner Wernersen
2008-Mar-11 08:58 UTC
[R] glm.fit: "fitted probabilities numerically 0 or 1 occurred"
Hi, could anyone explain to me what this warning message exactly means and what the consequences are? Is it due to the fact that there are very extreme observations / outliers included or what is the reason for it? Thanks so much, Werner Machen Sie Yahoo! zu Ihrer Startseite. Los geht's:
Prof Brian Ripley
2008-Mar-11 10:01 UTC
[R] glm.fit: "fitted probabilities numerically 0 or 1 occurred"
On Tue, 11 Mar 2008, Werner Wernersen wrote:> Hi, > > could anyone explain to me what this warning message > exactly means and what the consequences are? > Is it due to the fact that there are very extreme > observations / outliers included or what is the reason > for it?See MASS4 pp.197-8. (Assuming this is a binomial GLM: you did not say.)> Thanks so much, > Werner-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
(Ted Harding)
2008-Mar-11 10:08 UTC
[R] glm.fit: "fitted probabilities numerically 0 or 1 occurr
On 11-Mar-08 08:58:55, Werner Wernersen wrote:> Hi, > > could anyone explain to me what this warning message > exactly means and what the consequences are? > Is it due to the fact that there are very extreme > observations / outliers included or what is the reason > for it? > > Thanks so much, > WernerWhat it means is exactly what it says. How it arises will probably be some variant of the following kind of data (I'm guessing that your application of glm() was to data with 0/1 responses, as in a logistic regression): X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ... Y = 0 0 0 1 1 1 1 ... i.e. all the 0's occur on one side of a value (say 1.25) of X, and all the 1's occur on the other side. If you take a model (e.g. logistic): P(Y=1 | X) = exp((X-a)*b)/(1 + exp((X-a)*b)) then, for any finite values of a and b, the formula will give a value >0 for P(Y=1 | X) where X < 1.25 (i.e. where Y=0) so P(Y=0 | X) < 1; and a value <1 for P(Y=1 | X) where X > 1.25 (i.e. Y=1). However, if you take say a=1.25 (a value which separates the 0's from the 1,s), and then let b -> infinity, then you will find that P(Y=0 | X) -> 1, P(Y=1 | X) -> 0, for X < 1.25 P(Y=0 | X) -> 0, P(Y=1 | X) -> 1, for X > 1.25 so the limit as b -> inf perfectly predicts the observed outcome. However, the value of a is indeterminate so long as it is between the largest X for the Y=0 observations, and the smallest X for the Y=1 observations. This situation cannot arise with data where the largest X for which Y=0 is greater than the smallest X for which Y=1, e.g. X = 0.0 0.5 1.0 1.5 2.0 2.5 3.0 ... Y = 0 0 1 0 1 1 1 ... The above example is a very simple example of what is called "linear separation". It arises more generally when there are several covariates X1, X2, ... , Xk and there is a linear function L = a1*X1 + a2*X2 + ... + ak*Xk for which (with the data as observed) there is a value L0 such that Y = 0 for all the data such that L < L0 Y = 1 for all the data such that L > L0 In particular, if ever the number of covariates (k) is greater than (n-2), where n is the number of cases in your data, then you have (k+1) or fewer points in k dimensions, and there will be a k-dimensional plane (as given by L above) which will separate the (X1,...,Xk)-points where Y=0 from the (X1,...,Xk)-points where Y=1. Regardless of how you assign labels "Y=0" and "Y=1" to (k+1) or fewer points, they will be linearly separable. Even if k < n-1, so that they are not *in general* linearly separated, there is still a a positive probability that you can get data for which they are linerally separated; and then the same situation arises. This probability increases as the number of covariates (k) increases. What the warning message is telling you is that a perfect fit is possible within the parametrisation of the model: a probability P(Y=1)=0 is fitted to cases where the observed Y = 0; and a probability P(Y=1)=1 is fitted to cases where the observed Y = 1. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 11-Mar-08 Time: 10:08:04 ------------------------------ XFMail ------------------------------
Reasonably Related Threads
- glm.fit: fitted probabilities numerically 0 or 1 occurred?
- glm.fit: fitted probabilities numerically 0 or 1 occurred for a continuous variable?
- logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
- converting coordinates from utm to longitude / latitude
- List to Array