On Fri, 28 Apr 2000, Clayton Springer wrote:
> Dear R folks,
>
> Thanks to all your help before I have loaded a 1-D toy data set into
> R and did LDA on it. The toy data has Class=0 if value>0.
>
> > XY <-- read.table ("test.xy",header=T )
> > XY
> X.Class value
> 1 0 60.4897262
> 2 0 32.9554489
> 3 -1 -53.6459189
> 4 0 44.4450579
> .
> .
> .
> 998 -1 -43.4183157
> 999 0 7.9865092
> 1000 -1 -8.2279180
> > XY.lda <- lda(X.Class ~ value,XY)
> > XY.lda
> Call:
> lda.formula(X.Class ~ value, data = XY)
>
> Prior probabilities of groups:
> -1 0
> 0.521 0.479
>
> Group means:
> value
> -1 -48.66322
> 0 49.91819
>
> Coefficients of linear discriminants:
> LD1
> value 0.0357248
> > XY.lda$svd
> [1] 55.63543
> > XY.lda$class
> NULL
> > XY.lda$posterior
> NULL
>
> Question #1: How do I obtain the line that lda thinks divides the
> two groups? (which here it is between 1 and 2.)
Use the prediction equation, and solve for equal probabilities in the
groups.
> Next I load in a test set for prediction:
>
> > Predict0
> value
> 1 -10
> 2 -9
> 3 -8
> 4 -7
> 5 -6
> 6 -5
> 7 -4
> 8 -3
> 9 -2
> 10 -1
> 11 0
> 12 1
> 13 2
> 14 3
> 15 4
> 16 5
> 17 6
> 18 7
> 19 8
> 20 9
> 21 10
>
> > Predict0.lda <- predict(XY.lda,Predict0)
> > Predict0.lda$class
> [1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 0
>
> For those who don't want to count this shows that the dividing
> line is somewhere between 1 & 2, even though my toy data set
> can be perfectly divided at 0. I had not expected (Fischer's) LDA
> to behave this way.
lda is not Fisher's (no c) LDF, it is Rao's LDA. In particular, it takes
the class prevalences into account unless you set prior. lda is not a
perceptron, nor logistic discrimination.
> Question #2: Are there parameter adjustments and/or other LDA methods
> where I can get the expected dividing surface at 0. (presumability
> a classification tree would choose the line I desire, but I want
> a lda method that does this.)
No, a tree will not (it will not use linear combinations). Why do you
think the dividing surface should be at zero? Your training set is
asymmetric. I think you are looking for logistic discrimination not lda,
and are confusing performance on the training set with performance on
future examples: lda `knows' the populations are normally distributed.
I think you need to understand better the theory behing lda: see the book
for which it is supporting software.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._