Winter, Katherine
2009-May-27 10:22 UTC
[R] Warning message as a result of logistic regression performed
I am sorry if this question sounds basic but I am having trouble understanding a warning message I have been receiving in R after attempting logistic regression. I have been using the logistic regression function in R to analyse a simulated data set. The dependent variable "failure" has an outcome of either 0 (success) or 1 (failure). Both the independent variables have been previously generated in a mathematical model and stored in a data.frame for analysis. I am currently using a sample size of 1000 and I use the following commands in R: log.reg.1 <- glm(failure ~ age +weight +init.para.log.value +k.d1,family=binomial(logit), data=test) log.reg.1.summary <- summary(log.reg.1); print(log.reg.1.summary) log.reg.1.exp <- exp(log.reg.1$coef); print(log.reg.1.exp) When I execute these commands I get the following warning message: "In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, :fitted probabilities numerically 0 or 1 occurred" I am unsure what this warning is referring to. I have tried using google to answer this question but have had no luck. I have been on the following website https://stat.ethz.ch/pipermail/r-sig-ecology/2008-July/000278.html but found it was not helpful as I when I ran the example given I received no warning message (I am using R version 2.8.1). I am working with simulated data so there are no missing values in the data set. I have also looked at the following website http://tolstoy.newcastle.edu.au/R/help/05/07/7759.html they suggest that the warning is as a result of "perfect separation" of the results (a possibility with simulated data). However, when I added an extra row to my data.frame of results that I knew to be false and hence to prevent "perfect separation" subsequent logistic regression still resulted in the same warning message. I am still at a loss as to the meaning of this message and any help in understanding this warning would be much appreciated.
Gavin Simpson
2009-May-27 14:24 UTC
[R] Warning message as a result of logistic regression performed
Try reading this thread: http://thread.gmane.org/gmane.comp.lang.r.general/134368/focus=134475 especially the posts by I Kosmidis which show you how to diagnose problems in logit model fits like this. There is a statement about this warning in ?glm as well and a pointer to a reference which discusses a source of the warning. G On Wed, 2009-05-27 at 11:22 +0100, Winter, Katherine wrote:> I am sorry if this question sounds basic but I am having trouble understanding a warning message I have been receiving in R after attempting logistic regression. > > I have been using the logistic regression function in R to analyse a simulated data set. The dependent variable "failure" has an outcome of either 0 (success) or 1 (failure). Both the independent variables have been previously generated in a mathematical model and stored in a data.frame for analysis. I am currently using a sample size of 1000 and I use the following commands in R: > > log.reg.1 <- glm(failure ~ age +weight +init.para.log.value +k.d1,family=binomial(logit), data=test) > log.reg.1.summary <- summary(log.reg.1); print(log.reg.1.summary) > log.reg.1.exp <- exp(log.reg.1$coef); print(log.reg.1.exp) > > When I execute these commands I get the following warning message: > > "In glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart, :fitted probabilities numerically 0 or 1 occurred" > > I am unsure what this warning is referring to. I have tried using google to answer this question but have had no luck. > > I have been on the following website https://stat.ethz.ch/pipermail/r-sig-ecology/2008-July/000278.html but found it was not helpful as I when I ran the example given I received no warning message (I am using R version 2.8.1). > > I am working with simulated data so there are no missing values in the data set. > > I have also looked at the following website http://tolstoy.newcastle.edu.au/R/help/05/07/7759.html they suggest that the warning is as a result of "perfect separation" of the results (a possibility with simulated data). However, when I added an extra row to my data.frame of results that I knew to be false and hence to prevent "perfect separation" subsequent logistic regression still resulted in the same warning message. > > I am still at a loss as to the meaning of this message and any help in understanding this warning would be much appreciated. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090527/25eba3a6/attachment-0002.bin>