I'd like to thank John Fox and Chuck Cleland for their help in resovling this issue. It turned out to be something simple, but perhaps others have had similar problems In my original data frame, I had 4 categories of race/ethnicity. One of the categories (other) was very small, and not similar to any of the other three categories, so I created a new data frame deleting those people. However, the level "other" was still there, with no one in it. This didn't cause a problem for glm or lm, but it did for polr. When I eliminated that level, the problem disappeared. Thanks again for the help Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)
On Fri, 8 Oct 2004, Peter Flom wrote:> I'd like to thank John Fox and Chuck Cleland for their help in resovling > this issue. It turned out to be something simple, but perhaps others > have had similar problems > > In my original data frame, I had 4 categories of race/ethnicity. One of > the categories (other) was very small, and not similar to any of the > other three categories, so I created a new data frame deleting those > people. > > However, the level "other" was still there, with no one in it. > This didn't cause a problem for glm or lm, but it did for polr. When I > eliminated that level, the problem disappeared.How did you use `glm or lm' for an order factor response? An empty factor level will certainly cause glm problems, depending which one it is. An empty level will always cause polr problems, as there is no MLE under those circumstances. I will add a sanity check in due course. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Prof Brian Ripley <ripley at stats.ox.ac.uk> 10/09/04 3:18 AM asked <<< How did you use `glm or lm' for an order factor response? An empty factor level will certainly cause glm problems, depending which one it is. An empty level will always cause polr problems, as there is no MLE under those circumstances. I will add a sanity check in due course.>>>The analyses were part of a paper I am writing, illustrating that, when the DV is oddly distributed (the DV in question was a count, with many 0's, and a long right tail) that the 'usual' methods not only are wrong for statisically reasaons (such as grossly violating model assumptions) but also give bad results. While this is widely known to statisticians, in the fields in which I work, people sometimes analyze such variables using either OLS regression (hence lm), or by categorizing the DV into something like 0, 1, 2, more than 2 (hence the need for polr). I also tried Poisson regression and negative binomial regression (hence glm). The empty level of the IV only caused a problem for polr Thanks Peter Peter Peter L. Flom, PhD Assistant Director, Statistics and Data Analysis Core Center for Drug Use and HIV Research National Development and Research Institutes 71 W. 23rd St www.peterflom.com New York, NY 10010 (212) 845-4485 (voice) (917) 438-0894 (fax)