I have two specific questions regarding the output of lda function in MASS.
#Question1:
#======== n: sample size, p: number of variables
Some articles in the literature say that LDA is singular
for p > n-1. However, my experimentation with lda (default arguments) for
two class problems shows collinearity for p > n-2.
Does anyone know why this is the case? Does lda (MASS) use a
different algorithm?
#Question2:
#========
When I plot leave-one-out CV based on lda (averaged over
500 simulated data sets), I see a pick (see the link
http://homepages.ed.ac.uk/mkhondok/temp/lda_R-help-CV.png )
at p = n-3 (not n-2!). I would appreciate if someone could help me get
an explanation for this behaviour.
## R code
## Reproducible example
library(MASS)
# n: sample size
# p: number of variables
## Function
## --------
test.fun<-function(n,p){
x<-matrix (rnorm(n*p), ncol=p)
x[1:(n/2),]<-x[1:(n/2),]+1
colnames(x)<-paste("V",1:p, sep="")
y<-rep(c("G1", "G2"), each=n/2)
dat<-data.frame(y,x)
lda(y~., data=dat)
}
test.fun(20, 20) ## Warning: Variables are collinear
test.fun(20, 19) ## Warning: Variables are collinear
test.fun(20, 18) ## OK
> sessionInfo()
R version 2.8.0 (2008-10-20)
i486-pc-linux-gnu
locale:
LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MASS_7.2-45>
--
Mizanur Khondoker
Division of Pathway Medicine (DPM)
The University of Edinburgh Medical School
The Chancellor's Building
49 Little France Crescent
Edinburgh EH16 4SB
United Kingdom
Tel: +44 (0) 131 242 6287
Fax: +44 (0) 131 242 6244
http://homepages.ed.ac.uk/mkhondok
--
Mizanur Khondoker
Division of Pathway Medicine (DPM)
The University of Edinburgh Medical School
The Chancellor's Building
49 Little France Crescent
Edinburgh EH16 4SB
United Kingdom
Tel: +44 (0) 131 242 6287
Fax: +44 (0) 131 242 6244