hi there,
i have one more question about LDA. just to make surei understand,
suppose we have two classes, then if i specify a prior=c(.3,.7) in
lda(...) this will affect my between classes covariance matrix as in:
SB = (.3*m1 - .7*m2) %*% inv(Sigma) %*% t(.3*m1 - .7*m2)
[is Sigma affected ?] and the threshold to decide which class to assign
'test' data = log(.3/.7)
if i specify a prior=c(.2,.8) in predict(...), but not in lda(...) then
SB will not be affected, but and the threshold to decide which class to
assign to my 'test' data will be at log(.8/.2)
--- --- --- manual --- --- ---
Details:
The function tries hard to detect if the within-class covariance
matrix is singular. If any variable has within-group variance less
than `tol^2' it will stop and report the variable as constant.
This could result from poor scaling of the problem, but is more
likely to result from constant variables.
Specifying the `prior' will affect the classification unless
over-ridden in `predict.lda'. Unlike in most statistical packages,
it will also affect the rotation of the linear discriminants
within their space, as a weighted between-groups covariance matrix
is used. Thus the first few linear discriminants emphasize the
differences between groups with the weights given by the prior,
which may differ from their prevalence in the dataset.
Do read the reference: MASS (the book), *and* the code. Your question is addressed in the primary reference for the function, with references to the original papers. On Sun, 25 May 2003, Edoardo M Airoldi wrote:> i have one more question about LDA. just to make surei understand, > suppose we have two classes, then if i specify a prior=c(.3,.7) in > lda(...) this will affect my between classes covariance matrix as in: > > SB = (.3*m1 - .7*m2) %*% inv(Sigma) %*% t(.3*m1 - .7*m2) > > [is Sigma affected ?] and the threshold to decide which class to assign > 'test' data = log(.3/.7)Sigma is undefined! That symbol is normally used to indicate the *population* within-class covariance matrix. But no, you do not seem to understand, so please consult your local statistical experts (since you seem inexplicably loath to read our book). Overview for well-informed readers: the `Fisher' view of LDA has no priors: the Rao and Bryan views do, and they differ. In practice it only matters if LDA is used when the within-class covariances are not common but the conventional theory assumes the model is correct. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595