hi!
this is a question about lda (MASS) in R on a particular dataset.
I'm not a specialist about any of this but:
First with the well-known "iris" dataset, I tried using lda to
discriminate
versicolor from the other to classes and I got approx. 70% of accuracy
testing on train set. In iris, versicolor stands "between" the 2 other
so
one can expect lda not to perform well since it cannot cluser the negative
instances (seposa+virginica) together (Is this correct?) (KNN=96% in xval.)
Now, I use my "real" dataset (900 instances, 21 attributes), which 2
classes
can be serparated with accuracy no more than 80% (10xval) with KNN, SVM, C4.5
and the like.
So I was very surprised to see that lda also gets an accuracy of 80% on it,
because lda is very simple (finding the best line -- for a 2 classes
problem -- and using projections on the line for classification.)
So my question is: how does lda (in MASS) use the projections to make
the decision? Usually the decision for a test instances is made
using means and variances of the 2 classes but there are other possibilites
(especially in higher dimensions.)
Thanks for any idea, the doc is a bit spares and Venebles&Ripley's book
also for this particular matter.
Samuel
PS: and does anybody know how to use the CV option of lda to make xval?
I can't get it.