Hello: I am new to R and statistics and I have two questions. First I need help to interpret the cross-validation result from the R linear discriminant analysis function "lda". I did the following: lda (group ~ Var1 + Var2, CV=T) where "CV=T" tells the lda to do cross-validation. The output of lda are the posterior probabilities among other things, but I can't find an error term (like delta returned by cv.glm). My question is how to get such an error term from the output? Can I just simply calculate the prediction accuracy using the posterior probabilities from the cross-validation, and use that to measure the quality of the model? Another question is more basic: how to determine if a lda model is significant? (There is no p-value.) Thanks, Yu Shao Wadsworth Research Center Department of Health of New York State Albany, NY 12208
Prof Brian Ripley
2004-Sep-16 04:50 UTC
[R] Cross-validation for Linear Discrimitant Analysis
On Wed, 15 Sep 2004, Yu Shao wrote:> I am new to R and statistics and I have two questions.Perhaps then you need to start by explaining why you are using LDA. Please take a good look at the posting guide.> First I need help to interpret the cross-validation result from the R > linear discriminant analysis function "lda".You mean Professor Ripley's function lda in package MASS, I guess.> I did the following: > > lda (group ~ Var1 + Var2, CV=T)R allows you to use meaningful names, so please do so.> where "CV=T" tells the lda to do cross-validation. The output of lda are > the posterior probabilities among other things, but I can't find an error > term (like delta returned by cv.glm). My question is how to get such an > error term from the output? Can I just simply calculate the prediction > accuracy using the posterior probabilities from the cross-validation, and > use that to measure the quality of the model?cv.glm as in Dr Canty's package boot? If you are trying to predict classifications, LDA is not the right tool, and LOO CV probably is not either. There is no unique definition of `error term' (true for cv.glm as well), and people have written whole books about how to assess classifiers. LDA is about `discrimination' not `allocation' in the jargon used ca 1960.> Another question is more basic: how to determine if a lda model is > significant? (There is no p-value.) Thanks,Please do read the references on the ?lda page. It's not a useful question, as LDA is about discriminating between populations and makes the unrealistic assumption of multivariate normality. (Analogously for linear regression, there are ways to test if that is (statistically) `significant', but knowledgable users almost never do so.) Perhaps more realistic advice is to suggest you seek some statistical consultancy. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595