thr3ads.net - R help - [R] help with lda function [Sep 2009]

If this information is useful, please help other people find it:
Share via:

Pete Shepard

2009-Sep-28 21:54 UTC

[R] help with lda function

I am having a problem understanding the lda package. I have a dataset here:

    [,1] [,2] [,3]
[1,] 2.95 6.63    0
[2,] 2.53 7.79    0
[3,] 3.57 5.65    0
[4,] 3.16 5.47    0
[5,] 2.58 4.46    1
[6,] 2.16 6.22    1
[7,] 3.27 3.52    1

If I do the following;

"names(d)<-c("y","x1","x2")
d$x1 = d$x1 * 100
d$x2 = d$x2 * 100
g<-lda( y ~ x1 + x2, data=d)
v2 <- predict(g, d)",

I get;
        LD1
1 -2.3769280
2 -2.7049437
3 -3.4748309
4 -0.9599825
5  4.2293774
6  2.6052193
7  2.6820884

However, If I do it manually,

"rawdata<-matrix(scan("tab1_1.>
> dat"),ncol=3,byrow=T)
> group <- rawdata[,1]
> X <- 100 * rawdata[,2:3]
> Apf <- X[group==1,]
> Af <- X[group==0,]
> xbar1 <- apply(Af, 2, mean)
> S1 <- var(Af)
> N1 <- dim(Af)[1]
> xbar2 <- apply(Apf, 2, mean)
> S2 <- var(Apf)
> N2 <- dim(Apf)[1]
> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
> Sinv=solve(S)
> d<-xbar1-xbar2
> b <- Sinv %*% d
> v <- X %*% b",
>
> I get;
>
>         [,1]
> [1,] 164.4283
> [2,] 166.2492
> [3,] 170.5232
> [4,] 156.5622
> [5,] 127.7540
> [6,] 136.7704
> [7,] 136.3436
>




>
> I am having a problem understanding the lda package. I have a dataset here:
>
>     [,1] [,2] [,3]
> [1,] 2.95 6.63    0
> [2,] 2.53 7.79    0
> [3,] 3.57 5.65    0
> [4,] 3.16 5.47    0
> [5,] 2.58 4.46    1
> [6,] 2.16 6.22    1
> [7,] 3.27 3.52    1
>
> If I do the following;
>
> "names(d)<-c("y","x1","x2")
> d$x1 = d$x1 * 100
> d$x2 = d$x2 * 100
> g<-lda( y ~ x1 + x2, data=d)
> v2 <- predict(g, d)",
>
> I get;
>         LD1
> 1 -2.3769280
> 2 -2.7049437
> 3 -3.4748309
> 4 -0.9599825
> 5  4.2293774
> 6  2.6052193
> 7  2.6820884
>
> However, If I do it manually,
>
> "rawdata<-matrix(scan("tab1_1.dat"),ncol=3,byrow=T)
> group <- rawdata[,1]
> X <- 100 * rawdata[,2:3]
> Apf <- X[group==1,]
> Af <- X[group==0,]
> xbar1 <- apply(Af, 2, mean)
> S1 <- var(Af)
> N1 <- dim(Af)[1]
> xbar2 <- apply(Apf, 2, mean)
> S2 <- var(Apf)
> N2 <- dim(Apf)[1]
> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
> Sinv=solve(S)
> d<-xbar1-xbar2
> b <- Sinv %*% d
> v <- X %*% b",
>
> I get;
>
>         [,1]
> [1,] 164.4283
> [2,] 166.2492
> [3,] 170.5232
> [4,] 156.5622
> [5,] 127.7540
> [6,] 136.7704
> [7,] 136.3436
>
>
> It seems there is an extra step that I am missing? The predict step that
> adds a constant to the second set of values? Can anyone clear this up for
> me?
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

David Winsemius

2009-Sep-28 22:41 UTC

head link

[R] help with lda function

Your results are the same (after scaling and sign reversal) out to the  
4th decimal place as those from lda (which by the way is almost  
certainly from the MASS package and not from an impossible to find  
"lda package".)

 > read.table(textConnection(txt))
         V1
1 164.4283
2 166.2492
3 170.5232
4 156.5622
5 127.7540
6 136.7704
7 136.3436
 > est <-read.table(textConnection(txt))
 > scale(est)
              V1
[1,]  0.7656185
[2,]  0.8712707
[3,]  1.1192567
[4,]  0.3092117
[5,] -1.3622976
[6,] -0.8391481
[7,] -0.8639119
attr(,"scaled:center")
      V1
151.233
attr(,"scaled:scale")
       V1
17.23484

 > LD1est <- read.table(textConnection(" LD1
+ 1 -2.3769280
+ 2 -2.7049437
+ 3 -3.4748309
+ 4 -0.9599825
+ 5  4.2293774
+ 6  2.6052193
+ 7  2.6820884"), header=T)


 > scale(LD1est)
          LD1
1 -0.7656170
2 -0.8712721
3 -1.1192555
4 -0.3092138
5  1.3622976
6  0.8391505
7  0.8639103
attr(,"scaled:center")
           LD1
-3.172066e-17
attr(,"scaled:scale")
      LD1
3.104591

On Sep 28, 2009, at 5:54 PM, Pete Shepard wrote:
> I am having a problem understanding the lda package. I have a  
> dataset here:
>
>    [,1] [,2] [,3]
> [1,] 2.95 6.63    0
> [2,] 2.53 7.79    0
> [3,] 3.57 5.65    0
> [4,] 3.16 5.47    0
> [5,] 2.58 4.46    1
> [6,] 2.16 6.22    1
> [7,] 3.27 3.52    1
>
> If I do the following;
>
> "names(d)<-c("y","x1","x2")
> d$x1 = d$x1 * 100
> d$x2 = d$x2 * 100
> g<-lda( y ~ x1 + x2, data=d)
> v2 <- predict(g, d)",
>
> I get;
>        LD1
> 1 -2.3769280
> 2 -2.7049437
> 3 -3.4748309
> 4 -0.9599825
> 5  4.2293774
> 6  2.6052193
> 7  2.6820884
>
> However, If I do it manually,
>
> "rawdata<-matrix(scan("tab1_1.
>>
>> dat"),ncol=3,byrow=T)
>> group <- rawdata[,1]
>> X <- 100 * rawdata[,2:3]
>> Apf <- X[group==1,]
>> Af <- X[group==0,]
>> xbar1 <- apply(Af, 2, mean)
>> S1 <- var(Af)
>> N1 <- dim(Af)[1]
>> xbar2 <- apply(Apf, 2, mean)
>> S2 <- var(Apf)
>> N2 <- dim(Apf)[1]
>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>> Sinv=solve(S)
>> d<-xbar1-xbar2
>> b <- Sinv %*% d
>> v <- X %*% b",
>>
>> I get;
>>
>>        [,1]
>> [1,] 164.4283
>> [2,] 166.2492
>> [3,] 170.5232
>> [4,] 156.5622
>> [5,] 127.7540
>> [6,] 136.7704
>> [7,] 136.3436
>>
>
>
>
>
>
>
>>
>> I am having a problem understanding the lda package. I have a  
>> dataset here:
>>
>>    [,1] [,2] [,3]
>> [1,] 2.95 6.63    0
>> [2,] 2.53 7.79    0
>> [3,] 3.57 5.65    0
>> [4,] 3.16 5.47    0
>> [5,] 2.58 4.46    1
>> [6,] 2.16 6.22    1
>> [7,] 3.27 3.52    1
>>
>> If I do the following;
>>
>> "names(d)<-c("y","x1","x2")
>> d$x1 = d$x1 * 100
>> d$x2 = d$x2 * 100
>> g<-lda( y ~ x1 + x2, data=d)
>> v2 <- predict(g, d)",
>>
>> I get;
>>        LD1
>> 1 -2.3769280
>> 2 -2.7049437
>> 3 -3.4748309
>> 4 -0.9599825
>> 5  4.2293774
>> 6  2.6052193
>> 7  2.6820884
>>
>> However, If I do it manually,
>>
>> "rawdata<-matrix(scan("tab1_1.dat"),ncol=3,byrow=T)
>> group <- rawdata[,1]
>> X <- 100 * rawdata[,2:3]
>> Apf <- X[group==1,]
>> Af <- X[group==0,]
>> xbar1 <- apply(Af, 2, mean)
>> S1 <- var(Af)
>> N1 <- dim(Af)[1]
>> xbar2 <- apply(Apf, 2, mean)
>> S2 <- var(Apf)
>> N2 <- dim(Apf)[1]
>> S<-((N1-1)*S1+(N2-1)*S2)/(N1+N2-2)
>> Sinv=solve(S)
>> d<-xbar1-xbar2
>> b <- Sinv %*% d
>> v <- X %*% b",
>>
>> I get;
>>
>>        [,1]
>> [1,] 164.4283
>> [2,] 166.2492
>> [3,] 170.5232
>> [4,] 156.5622
>> [5,] 127.7540
>> [6,] 136.7704
>> [7,] 136.3436
>>
>>
>> It seems there is an extra step that I am missing? The predict step  
>> that
>> adds a constant to the second set of values? Can anyone clear this  
>> up for
>> me?
>


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

Maybe Matching Threads

Search for more apparently analagous threads

R help - Sep 2009 - help with lda function

[R] help with lda function

[R] help with lda function

Maybe Matching Threads