Christoph Lehman had problems with seperated data in two-class logistic regression. One useful little trick is to penalize the logistic regression using a quadratic penalty on the coefficients. I am sure there are functions in the R contributed libraries to do this; otherwise it is easy to achieve via IRLS using ridge regressions. Then even though the data are separated, the penalized log-likelihood has a unique maximum. One intriguing feature is that as the penalty parameter goes to zero, the solution converges to the SVM solution - i.e. the optimal separating hyperplane see http://www-stat.stanford.edu/~hastie/Papers/margmax1.ps -------------------------------------------------------------------- Trevor Hastie hastie@stanford.edu Professor, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -------------------------------------------------------------------- [[alternative HTML version deleted]]
On Sat, 13 Sep 2003, Trevor Hastie wrote:> Christoph Lehman had problems with seperated data in two-class logistic > regression. > > One useful little trick is to penalize the logistic regression using a > quadratic penalty on the coefficients. I am sure there are functions in > the R contributed libraries to do this;Using nnet/multinom with weight decay does exactly this.> otherwise it is easy to achieve via IRLS using ridge regressions. Then > even though the data are separated, the penalized log-likelihood has a > unique maximum. One intriguing feature is that as the penalty parameter > goes to zero, the solution converges to the SVM solution - i.e. the > optimal separating hyperplane see > http://www-stat.stanford.edu/~hastie/Papers/margmax1.ps-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Sun, 14 Sep 2003 08:17:20 +0100 (BST) Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> On Sat, 13 Sep 2003, Trevor Hastie wrote: > > > Christoph Lehman had problems with seperated data in two-class logistic > > regression. > > > > One useful little trick is to penalize the logistic regression using a > > quadratic penalty on the coefficients. I am sure there are functions in > > the R contributed libraries to do this; > > Using nnet/multinom with weight decay does exactly this.Also the lrm function in the Design package will do quadratic penalization. Frank Harrell> > > otherwise it is easy to achieve via IRLS using ridge regressions. Then > > even though the data are separated, the penalized log-likelihood has a > > unique maximum. One intriguing feature is that as the penalty parameter > > goes to zero, the solution converges to the SVM solution - i.e. the > > optimal separating hyperplane see > > http://www-stat.stanford.edu/~hastie/Papers/margmax1.ps > > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help--- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
> in the paper "Avoiding the effects of concurvity in GAM's .." of > Figueiras et al. (2003) it is mentioned that in GLM collinearity is taken > into account in the calc of se but not in GAM (-> results in confidence > interval too narrow, p-value understated, GAM S-Plus version). I haven't > found any references to GAM and concurvity or collinearity on the R page. > And I wonder if the R version of Gam differ in this point.- the penalized regression spline representation means that it's easy to calculate the `correct' s.e.'s and this is what is done. The covariance matrix used is based on a Bayesian model of smoothing, generalized from Silverman (1985), JRSSB (and less closely, Wahba, 1983, JRSSB), so the s.e.'s are generally a little larger than you'd get if you just pretended that the GAM was an un-penalized GLM (this widening generally improves CI performance). As Thomas Lumley pointed out, the s.e.'s don't take into account smoothing parameter estimation uncertainty. In simulation studies this uncertainty seems to have very little effect on the realized coverage probabilities of Confidence Interval's that are in some sense `whole model' intervals, but the performance of CI's for component functions of the GAM can be quite a long way from nominal. There's a simple `not-very-computer-intensive' fix for this which removes the conditioning on the smoothing parameters and greatly improves component-wise coverage probabilities.... implementation is on my `to-do' list (might wait to see what the referees say though!) Simon ps. mgcv 0.9 out now! (changes list linked to my www page) _____________________________________________________________________> Simon Wood simon at stats.gla.ac.uk www.stats.gla.ac.uk/~simon/ >> Department of Statistics, University of Glasgow, Glasgow, G12 8QQ >>> Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814