thr3ads.net - R help - [R] Problems with lda-CV, and collinear variables in lda [Aug 2012]

If this information is useful, please help other people find it:
Share via:

Hennig, Christian

2012-Aug-14 11:53 UTC

[R] Problems with lda-CV, and collinear variables in lda

Dear R-help list,

two issues regarding lda.
1) I'm puzzled by the fact that lda's in-build cross-validation gives
results different from the manual cross-validation routine that I run (of course
mine may be wrong, but I don't think so).

See here:
library(MASS)
set.seed(12345)
n <- 50
p <- 10 # or p<- 200
testdata <- matrix(ncol=p,nrow=n)
for (i in 1:p)
   testdata[,i] <- rnorm(n)
class <- as.factor(c(rep(1,25),rep(2,25)))

lda1 <- lda(x=testdata,grouping=class,CV=TRUE)
table1 <- table(lda1$class,class)


y.lda <- rep(NA, n)
for(i in 1:n){
   testset <- testdata[i,,drop=FALSE]
   trainset <- testdata[-i,]
   model.lda <- lda(x=trainset,grouping=class[-i])
   y.lda[i] <- predict(model.lda, testset)$class
}
table2 <-table(y.lda, class)

With p=10:> table1    class
      1  2
   1 10 11
   2 15 14> table2      class
y.lda  1  2
     1 10 12
     2 15 13

Why are these not the same?

Getting closer to my second issue, it gets worse when p>n, e.g., p=200:
> table1    class
      1  2
   1 14 16
   2 11  9
> table2      class
y.lda  1  2
     1 15 10
     2 10 15

2) I can't find properly explained on the help page how lda is computed for
p>n, because its standard definition involves inversion of the within-class
covariance matrix, which cannot be inverted for p>n. It actually gives a
warning when p>n, but occasionally cross-validated results are quite good. I
have a guess how it's done but would be happy about clarification.

Best regards,
Christian

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Aug 2012 - Problems with lda-CV, and collinear variables in lda

[R] Problems with lda-CV, and collinear variables in lda

Possibly Parallel Threads

Wisdom of the Ancients