I've been groping my way through a classification/discrimination problem, from a consulting client. There are 26 observations, with 4 possible categories and 24 (!!!) potential predictor variables. I tried using lda() on the first 7 predictor variables and got 24 of the 26 observations correctly classified. (Training and testing both on the complete data set --- just to get started.) I then tried rpart() for comparison and was somewhat surprised when rpart() only managed to classify 14 of the 26 observations correctly. (I got the same classification using just the first 7 predictors as I did using all of the predictors.) I would have thought that rpart(), being unconstrained by a parametric model, would have a tendency to over-fit and therefore to appear to do better than lda() when the test data and training data are the same. Am I being silly, or is there something weird going on? I can give more detail on what I actually did, if anyone is interested. The data are pretty obviously nothing like Gaussian, so my gut feeling is that rpart() should be much more appropriate than lda(). And it does not seem surprizing that with so few observations to train with, the success rate should be low, even when testing and training on the same data set. What does surprise me is that lda() gets such a high success rate. Should I just put this down as a random occurrence of a low prob. event? cheers, Rolf Turner rolf at math.unb.ca P.S. Using CV=TRUE in lda() I got only 16 of the 26 observations correctly classified.
On Tue, 11 Feb 2003, Rolf Turner wrote:> > I've been groping my way through a classification/discrimination > problem, from a consulting client. There are 26 observations, with 4 > possible categories and 24 (!!!) potential predictor variables. > > I tried using lda() on the first 7 predictor variables and got 24 of > the 26 observations correctly classified. (Training and testing both > on the complete data set --- just to get started.) > > I then tried rpart() for comparison and was somewhat surprised when > rpart() only managed to classify 14 of the 26 observations correctly. > (I got the same classification using just the first 7 predictors as I > did using all of the predictors.) > > I would have thought that rpart(), being unconstrained by a parametric > model, would have a tendency to over-fit and therefore to appear to > do better than lda() when the test data and training data are the > same. > > Am I being silly, or is there something weird going on? I can > give more detail on what I actually did, if anyone is interested.The first. rpart is seriously constrained by having so few observations, and its model is much more restricted than lda: axis-parallel splits only. There is a similar example, with pictures, in MASS (on Cushings). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Apparently Analagous Threads
- rpart - how to estimate the “meaningful” predictors for an outcome (in classification trees)
- rpart error when constructing a classification tree
- Predicting classification error from rpart
- Decision tree model using rpart ( classification
- How to draw the decision boundaries for LDA and Rpart object