Hi, Christoph:
1. I didn't see in your original email that you wanted V to be
orthogonal, only that it's columns have length 1. You have a solution
satisfying the latter constraint, but not the former.
2. I don't have time now to sort out the details, and I don't have
them on the top of my head. I just entered "lda" into R 1.6.2 [after
library(MASS)] and got the following:
> lda
function (x, ...)
{
if (is.null(class(x)))
class(x) <- data.class(x)
UseMethod("lda", x, ...)
}
To decode 'UseMethod("lda", ...)', I requested
'methods("lda")' with
the following result:
> methods("lda")
[1] "lda.data.frame" "lda.default"
"lda.formula" "lda.matrix"
Have you tried listing each of these 4 functions and working through
them step by step? I think this should answer your question. Also see
Venables and Ripley (2002) Modern Applied Statistics with S, index entry
for "lda".
hth. spencer graves
Christoph Lehmann wrote:
> thanks a lot, Spencer
>
> The problem is the following: my textbook has an example with the data:
>
> X> x
> x1 x2 x3
> 1 3 3 4
> 2 4 4 3
> 3 4 4 6
> 4 2 5 5
> 5 2 4 5
> 6 3 4 6
> 7 3 4 4
> 8 2 5 5
> 9 4 3 6
> 10 5 5 6
> 11 4 5 7
> 12 4 6 4
> 13 3 6 6
> 14 4 7 6
> 15 6 5 6
> --
>
>>y
>
> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3
> --
>
>>Dtot <- (t(x)%*%x-t(xbar)%*%xbar)
>>Dtot
>
>
> x1 x2 x3
> x1 17.733333 2.666667 4.866667
> x2 2.666667 17.333333 4.333333
> x3 4.866667 4.333333 16.933333
> --
>
>>A <- cbind(tapply(x[,1],y,sum), tapply(x[,2],y,sum),
>
> tapply(x[,3],y,sum))
>
>>A
>
> [,1] [,2] [,3]
> 1 18 24 29
> 2 14 17 21
> 3 21 29 29
>
>>G <- apply(x,2,sum)
>>G
>
> x1 x2 x3
> 53 70 79
>
>>p <- ncol(x)
>>k <- length(freq)
>>N <- sum(freq)
>>Dtreat <- array(0,c(p,p))
>>k <- length(freq)
>>for (i in 1:p)
>
> + {
> + for (j in 1:k)
> + {
> + for (h in 1:k)
> + {
> + Dtreat[i,j] <- Dtreat[i,j] + A[h,i]*A[h,j]/freq[h]
> + }
> + Dtreat[i,j] <- Dtreat[i,j] - G[i]*G[j]/N
> + }
> + }
>
>>Dtreat
>
> [,1] [,2] [,3]
> [1,] 3.933333 5.966667 3.166667
> [2,] 5.966667 9.783333 4.783333
> [3,] 3.166667 4.783333 2.550000
> --
>
>>Derror <- Dtot-Dtreat
>>Derror
>
>
> x1 x2 x3
> x1 13.8 -3.30 1.70000
> x2 -3.3 7.55 -0.45000
> x3 1.7 -0.45 14.38333
>
> --
>
>>eigen(Dtreat%*%solve(Derror))
>
> $values
> [1] 2.300398e+00 2.039672e-02 -1.907034e-15
>
> $vectors
> [,1] [,2] [,3]
> [1,] -0.4870772 0.6813155 -0.6076020
> [2,] -0.7809602 -0.4342229 0.1539928
> [3,] -0.3909693 0.5892874 0.7791701
>
>
>>V <- eigen(Dtreat%*%solve(Derror))$vectors
>>V
>
> [,1] [,2] [,3]
> [1,] -0.4870772 0.6813155 -0.6076020
> [2,] -0.7809602 -0.4342229 0.1539928
> [3,] -0.3909693 0.5892874 0.7791701
>
> the textbook (SPSS) has similar eigenvalues, but only two!:
>
> lambda1 = 2.30048, lambda2 = 0.02091
> , but as I wrote in the last mail: different eigenvectors
>
> Let's start here with your recommendation:
> first, it seems, since the last eigenvalue is almost 0, that the
> eigenvectors V are not orthogonal:
>
>
>>t(V)%*%V
>
> [,1] [,2] [,3]
> [1,] 1.0000000 -0.22313575 -0.12894473
> [2,] -0.2231357 1.00000000 -0.02168078
> [3,] -0.1289447 -0.02168078 1.00000000
>
> let's continue anyway?
>
>>D.5 <- chol(Derror)
>>t(D.5) %*% D.5
>
>
> x1 x2 x3
> x1 13.8 -3.30 1.70000
> x2 -3.3 7.55 -0.45000
> x3 1.7 -0.45 14.38333
>
>>Dm.5 <- solve(D.5)
>>t(Dm.5) %*% Derror %*% Dm.5
>
>
> x1 x2 x3
> x1 1.000000e+00 -2.523481e-17 -1.097755e-18
> x2 -6.625163e-18 1.000000e+00 -2.120970e-18
> x3 4.501901e-18 4.460942e-19 1.000000e+00
> perfectly orthogonal
>
>>t(V)%*%t(Dm.5)%*%Dfehler%*%Dm.5%*%V
>
>
> [,1] [,2] [,3]
> [1,] 1.0000000 -0.22313575 -0.12894473
> [2,] -0.2231357 1.00000000 -0.02168078
> [3,] -0.1289447 -0.02168078 1.00000000
> again, equals t(V)%*%V not orthogonal.
>
> -- I think it has to do with the fact, that the textbook considers the
> third eigenvalue as = 0 and then gets the Vstar eigenvectors (which I
> try to reproduce:
>
> Vstar > [,1] [,2] [,3]
> [1,] 0.1689 0.1419 -0.1825
> [2,] 0.3498 -0.1597 0.0060
> [3,] 0.0625 0.1422 0.2154
>
> -
>
> Spencer if you find some minutes time to help me reproduce this example,
> it would be very nice (the data are from Jones 1961. He investigated
> whether essays written by children from lower, middle, upper class
> differ in sentence length, choosen words, complexity of sentence)
>
> Cheers
>
> Christoph
>
##########################################
The following satisfies some of your constraints but I don't
know if it satisfies all of them.
Let V = eigenvectors normalized so t(V) %*% V = I. Also, let
D.5 = some square root matrix, so t(D.5) %*% D.5 = Derror, and Dm.5 solve(D.5) =
invers of D.5. The Choleski decomposition ("chol")
provides one such solution, but you can construct a symmetric square
root using "eigen". Then Vstar = Dm.5%*%V will have the property you
mentioned below.
Consider the following:
> (Derror <- array(c(1,1,1,4), dim=c(2,2)))
[,1] [,2]
[1,] 1 1
[2,] 1 4
> D.5 <- chol(Derror)
> t(D.5) %*% D.5
[,1] [,2]
[1,] 1 1
[2,] 1 4
> (Dm.5 <- solve(D.5))
[,1] [,2]
[1,] 1 -0.5773503
[2,] 0 0.5773503
> (t(Dm.5) %*% Derror %*% Dm.5)
[,1] [,2]
[1,] 1 0
[2,] 0 1
Thus,t(Vstar)%*%Derror%*%Vstar = t(V)%*%t(Dm.5)%*%Derror%*%Dm.5%*%V
= t(V)%*%V = I.
hope this helps. spencer graves
Christoph Lehmann wrote:
> Hi dear R-users
>
> I try to reproduce the steps included in a LDA. Concerning the
eigenvectors there is
> a difference to SPSS. In my textbook (Bortz)
> it says, that the matrix with the eigenvectors
>
> V
>
> usually are not normalized to the length of 1, but in the way that the
> following holds (SPSS does the same thing):
>
> t(Vstar)%*%Derror%*%Vstar = I
>
>
> where Vstar are the normalized eigenvectors. Derror is an
"error" or
> "within" squaresum- and crossproduct matrix (squaresum of the p
> variables on the diagonale, and the non-diagonal elements are the sum of
> the crossproducts). For Derror the following holds: Dtotal = Dtreat +
> Derror.
>
> Since I assume that many of you are familiar with this transformation:
> can anybody of you tell me, how to conduct this transformation in R?
> Would be very nice. Thanks a lot
>
> Cheers
>
> Christoph
>