cdm
2009-May-24 13:01 UTC
[R] Animal Morphology: Deriving Classification Equation with Linear Discriminat Analysis (lda)
Fellow R Users:
I'm not extremely familiar with lda or R programming, but a recent editorial
review of a manuscript submission has prompted a crash cousre. I am on this
forum hoping I could solicit some much needed advice for deriving a
classification equation.
I have used three basic measurements in lda to predict two groups: male and
female. I have a working model, low Wilk's lambda, graphs, coefficients,
eigenvalues, etc. (see below). I adjusted the sample analysis for Fisher's
or Anderson's Iris data provided in the MASS library for my own data.
My final and last step is simply form the classification equation. The
classification equation is simply using standardized coefficients to
classify each group- in this case male or female. A more thorough
explanation is provided:
"For cases with an equal sample size for each group the classification
function coefficient (Cj) is expressed by the following equation:
Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp
where Cj is the score for the jth group, j = 1 ? k, cjo is the constant for
the jth group, and x = raw scores of each predictor. If W = within-group
variance-covariance matrix, and M = column matrix of means for group j, then
the constant cjo= (-1/2)CjMj" (Julia Barfield, John Poulsen, and Aaron
French
http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm).
I am unable to navigate this last step based on the R output I have. I only
have the linear discriminant coefficients for each predictor that would be
needed to complete this equation.
Please, if anybody is familiar or able to to help please let me know. There
is a spot in the acknowledgments for you.
All the best,
Chase Mendenhall
Below is the R Commander Windows
http://www.nabble.com/file/p23693355/LDA-WRMA.csv LDA-WRMA.csv :
Script Window:
#Dataset
workWRMA = read.csv("C:\\Users\\Chase\\Documents\\Interpubic
Distance\\LDA\\LDA-WRMA.csv")
workWRMA
#Linear Discriminant Function
model<-lda(WRMA_SEX~WRMA_WG+WRMA_WT+WRMA_ID,data=workWRMA)
model
plot(model)
predict(model)
#Wilk's Lambda FYI:(Sqrt(1- Wilks? lamda)=canonical correlation)
X<-as.matrix(workWRMA[-4])
Y<-workWRMA$WRMA_SEX
workWRMA.manova<-manova(X~Y)
workWRMA.wilks<-summary(workWRMA.manova,test="Wilks")
workWRMA.wilks
#Group Centroids
sum(LD1*(workWRMA$WRMA_SEX=="F"))/sum(workWRMA$WRMA_SEX=="F")
sum(LD1*(workWRMA$WRMA_SEX=="M"))/sum(workWRMA$WRMA_SEX=="M")
#Eigenvalue/Canonical Correlation
model$svd
Output Window:
> #Dataset
> workWRMA = read.csv("C:\\Users\\Chase\\Documents\\Interpubic
> Distance\\LDA\\LDA-WRMA.csv")
> workWRMA
WRMA_WG WRMA_WT WRMA_ID WRMA_SEX
1 57.50000 11.980000 5.300000 F
2 60.50000 12.250000 8.100000 F
3 59.10000 13.280000 6.650000 F
4 61.00000 12.200000 7.300000 F
5 59.42857 13.042857 7.700000 F
6 59.20000 13.340000 10.200000 F
7 60.20000 12.600000 5.000000 F
8 61.00000 13.250000 8.000000 F
9 59.66667 12.160000 5.300000 F
10 59.00000 12.425000 8.000000 F
11 59.71429 12.333333 6.000000 F
12 60.16667 12.500000 4.400000 F
13 60.20000 13.700000 7.600000 F
14 61.00000 12.100000 6.900000 F
15 57.88889 12.100000 6.550000 F
16 58.40000 12.740000 6.000000 F
17 60.50000 12.900000 7.500000 F
18 60.00000 13.850000 9.400000 F
19 56.50000 12.600000 5.600000 F
20 59.00000 11.700000 6.500000 F
21 60.00000 12.800000 9.100000 F
22 59.00000 12.000000 6.300000 F
23 56.00000 11.900000 4.000000 F
24 60.00000 11.800000 6.200000 F
25 60.00000 13.150000 3.800000 F
26 61.00000 12.300000 6.100000 F
27 61.25000 12.050000 4.000000 F
28 57.00000 11.950000 5.700000 F
29 59.00000 12.700000 5.200000 F
30 58.50000 13.350000 4.050000 F
31 54.00000 12.000000 5.100000 F
32 60.00000 12.000000 4.100000 F
33 58.00000 12.300000 6.500000 F
34 57.00000 12.600000 4.600000 F
35 57.00000 11.500000 6.300000 F
36 60.00000 12.200000 6.700000 F
37 60.60000 12.525000 8.766667 F
38 58.33333 13.533333 11.500000 F
39 59.25000 12.575000 5.700000 F
40 61.12500 12.733333 7.000000 F
41 60.00000 12.200000 5.900000 F
42 60.00000 11.850000 4.900000 F
43 59.25000 12.600000 8.100000 F
44 60.00000 12.300000 7.900000 F
45 61.00000 12.400000 4.500000 F
46 57.00000 12.600000 5.800000 F
47 59.00000 12.900000 6.400000 F
48 57.00000 12.000000 7.150000 F
49 55.00000 12.700000 5.200000 F
50 61.00000 12.200000 5.700000 F
51 60.00000 11.700000 7.000000 F
52 59.00000 14.400000 5.000000 F
53 60.50000 13.000000 6.300000 F
54 60.00000 12.800000 8.000000 F
55 56.50000 12.100000 6.300000 F
56 56.33333 12.800000 3.450000 F
57 59.00000 11.850000 4.000000 F
58 57.00000 12.000000 4.200000 F
59 61.50000 12.700000 5.200000 F
60 58.00000 12.600000 4.000000 F
61 57.00000 12.800000 6.800000 F
62 56.00000 11.900000 4.100000 F
63 61.00000 14.100000 7.000000 F
64 59.00000 12.300000 5.900000 F
65 60.00000 13.300000 8.100000 F
66 59.50000 12.750000 6.400000 F
67 62.00000 11.700000 6.000000 F
68 60.00000 12.600000 6.500000 F
69 60.00000 13.100000 9.200000 F
70 58.12500 12.462500 9.000000 F
71 60.16667 12.650000 10.250000 F
72 59.50000 12.866667 7.300000 F
73 58.25000 12.675000 8.800000 F
74 60.00000 12.220000 8.100000 F
75 58.77778 12.955556 4.700000 F
76 59.00000 12.800000 9.600000 F
77 60.60000 12.257143 7.500000 F
78 59.62500 13.428571 5.700000 F
79 60.00000 13.025000 6.000000 F
80 57.60000 12.025000 5.450000 F
81 60.25000 12.675000 6.100000 F
82 58.66667 12.781818 7.000000 F
83 59.42857 13.242857 8.900000 F
84 58.40000 12.980000 6.800000 F
85 60.50000 12.600000 6.200000 F
86 60.00000 10.350000 3.050000 M
87 59.00000 11.100000 1.900000 M
88 58.50000 10.083333 1.600000 M
89 60.50000 10.700000 3.000000 M
90 58.50000 11.400000 2.200000 M
91 58.60000 10.320000 2.500000 M
92 61.75000 10.580000 1.400000 M
93 60.80000 11.300000 2.800000 M
94 59.33333 10.150000 2.200000 M
95 59.60000 10.066667 2.100000 M
96 58.60000 10.466667 1.800000 M
97 59.33333 11.300000 3.000000 M
98 58.50000 11.400000 2.000000 M
99 60.20000 10.416667 1.800000 M
100 58.66667 10.050000 2.200000 M
101 62.00000 10.980000 2.400000 M
102 61.00000 11.550000 2.150000 M
103 55.00000 10.433333 2.200000 M
104 57.00000 10.450000 2.500000 M
105 59.25000 10.800000 2.800000 M
106 59.00000 10.500000 1.300000 M
107 60.00000 10.300000 2.000000 M
108 57.00000 11.450000 3.250000 M
109 56.00000 9.900000 2.400000 M
110 57.00000 10.900000 2.600000 M
111 59.83333 10.500000 2.300000 M
112 59.25000 11.075000 3.600000 M
113 57.50000 10.725000 2.300000 M
114 60.33333 10.766667 2.400000 M
115 58.00000 10.800000 2.250000 M
116 60.50000 10.850000 2.700000 M
117 60.00000 11.900000 2.700000 M
118 57.00000 10.000000 2.500000 M
119 58.00000 11.000000 1.800000 M
120 61.00000 10.500000 2.400000 M
121 59.33333 8.700000 1.200000 M
122 60.20000 10.625000 2.400000 M
123 55.00000 10.000000 2.300000 M
124 58.50000 9.800000 1.800000 M
125 59.00000 9.866667 2.050000 M
126 58.00000 10.300000 2.000000 M
127 58.50000 9.475000 1.950000 M
128 58.00000 9.750000 2.500000 M
129 58.66667 10.100000 2.000000 M
130 61.00000 10.560000 3.700000 M
131 59.00000 10.050000 1.600000 M
132 56.00000 10.400000 3.000000 M
133 59.00000 10.550000 1.600000 M
134 60.00000 10.600000 2.000000 M
135 57.00000 10.400000 2.700000 M
136 58.00000 10.500000 2.700000 M
137 57.00000 10.300000 3.400000 M
138 61.00000 11.000000 2.200000 M
139 57.00000 11.100000 2.200000 M
140 57.00000 10.600000 2.300000 M
141 59.00000 10.800000 2.700000 M
142 58.25000 10.800000 2.700000 M
143 59.50000 10.700000 1.800000 M
144 58.00000 11.000000 1.200000 M
145 58.00000 11.800000 2.500000 M
146 58.00000 10.766667 2.900000 M
147 58.40000 10.360000 2.700000 M
148 57.25000 12.275000 1.300000 M
149 59.66667 10.933333 1.500000 M
150 59.66667 10.085714 2.400000 M
151 57.08333 10.040000 1.200000 M
152 60.00000 10.480000 2.300000 M
153 59.66667 10.250000 1.500000 M
> #Linear Discriminant Function
> model<-lda(WRMA_SEX~WRMA_WG+WRMA_WT+WRMA_ID,data=workWRMA)
> model
Call:
lda(WRMA_SEX ~ WRMA_WG + WRMA_WT + WRMA_ID, data = workWRMA)
Prior probabilities of groups:
F M
0.5555556 0.4444444
Group means:
WRMA_WG WRMA_WT WRMA_ID
F 59.14702 12.57545 6.483725
M 58.76814 10.58869 2.270588
Coefficients of linear discriminants:
LD1
WRMA_WG 0.1013304
WRMA_WT -1.1900916
WRMA_ID -0.4610512
> plot(model)
> predict(model)
$class
[1] F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F
[38] F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F
F F
[75] F F F F F F F F F F F M M M M M M M M M M M M M M M M M M M M M M M M
M M
[112] M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M
M M
[149] M M M M M
Levels: F M
$posterior
F M
1 9.909078e-01 9.092155e-03
2 9.999655e-01 3.446277e-05
3 9.999983e-01 1.743091e-06
4 9.997338e-01 2.662160e-04
5 9.999992e-01 8.487788e-07
6 1.000000e+00 1.240453e-09
7 9.977228e-01 2.277207e-03
8 9.999997e-01 3.240250e-07
9 9.907005e-01 9.299547e-03
10 9.999910e-01 9.015325e-06
11 9.989992e-01 1.000786e-03
12 9.879689e-01 1.203115e-02
13 9.999999e-01 5.121521e-08
14 9.990285e-01 9.714536e-04
15 9.994961e-01 5.039181e-04
16 9.999281e-01 7.189251e-05
17 9.999959e-01 4.132020e-06
18 1.000000e+00 6.345395e-10
19 9.998586e-01 1.414129e-04
20 9.931840e-01 6.816027e-03
21 9.999998e-01 2.373302e-07
22 9.977888e-01 2.211185e-03
23 9.149174e-01 8.508259e-02
24 9.886453e-01 1.135473e-02
25 9.986434e-01 1.356643e-03
26 9.983032e-01 1.696775e-03
27 7.039998e-01 2.960002e-01
28 9.960977e-01 3.902309e-03
29 9.994490e-01 5.510397e-04
30 9.998429e-01 1.571487e-04
31 9.973034e-01 2.696605e-03
32 7.941107e-01 2.058893e-01
33 9.997887e-01 2.112797e-04
34 9.987451e-01 1.254929e-03
35 9.883083e-01 1.169170e-02
36 9.994376e-01 5.624386e-04
37 9.999976e-01 2.396930e-06
38 1.000000e+00 2.472603e-11
39 9.995669e-01 4.331337e-04
40 9.999662e-01 3.377496e-05
41 9.972905e-01 2.709532e-03
42 8.968123e-01 1.031877e-01
43 9.999966e-01 3.391523e-06
44 9.999681e-01 3.191964e-05
45 9.767181e-01 2.328195e-02
46 9.998816e-01 1.184359e-04
47 9.999812e-01 1.881693e-05
48 9.998249e-01 1.751186e-04
49 9.999023e-01 9.772348e-05
50 9.938317e-01 6.168348e-03
51 9.960619e-01 3.938106e-03
52 9.999999e-01 1.451635e-07
53 9.999736e-01 2.637331e-05
54 9.999979e-01 2.067820e-06
55 9.995480e-01 4.520216e-04
56 9.967328e-01 3.267176e-03
57 6.950057e-01 3.049943e-01
58 9.450219e-01 5.497811e-02
59 9.983770e-01 1.623042e-03
60 9.937325e-01 6.267495e-03
61 9.999940e-01 5.992460e-06
62 9.290384e-01 7.096159e-02
63 1.000000e+00 3.090428e-08
64 9.989404e-01 1.059617e-03
65 9.999999e-01 1.339539e-07
66 9.999500e-01 5.004774e-05
67 9.370296e-01 6.297037e-02
68 9.998907e-01 1.093299e-04
69 1.000000e+00 4.246494e-08
70 9.999993e-01 7.131770e-07
71 9.999999e-01 5.684500e-08
72 9.999953e-01 4.707545e-06
73 9.999996e-01 3.791434e-07
74 9.999677e-01 3.233061e-05
75 9.996344e-01 3.656492e-04
76 9.999999e-01 5.756549e-08
77 9.998870e-01 1.130204e-04
78 9.999933e-01 6.669888e-06
79 9.999662e-01 3.376694e-05
80 9.943574e-01 5.642595e-03
81 9.998172e-01 1.828443e-04
82 9.999909e-01 9.117026e-06
83 1.000000e+00 2.896873e-08
84 9.999956e-01 4.400247e-06
85 9.997551e-01 2.449120e-04
86 1.118353e-04 9.998882e-01
87 8.088344e-04 9.991912e-01
88 3.182563e-06 9.999968e-01
89 4.829728e-04 9.995170e-01
90 8.256460e-03 9.917435e-01
91 5.961000e-05 9.999404e-01
92 6.562593e-06 9.999934e-01
93 5.997017e-03 9.940030e-01
94 1.014214e-05 9.999899e-01
95 4.860967e-06 9.999951e-01
96 3.166796e-05 9.999683e-01
97 1.658551e-02 9.834145e-01
98 5.585009e-03 9.944150e-01
99 1.229610e-05 9.999877e-01
100 8.142222e-06 9.999919e-01
101 3.214697e-04 9.996785e-01
102 5.452544e-03 9.945475e-01
103 2.786717e-04 9.997213e-01
104 2.304626e-04 9.997695e-01
105 9.294258e-04 9.990706e-01
106 1.179463e-05 9.999882e-01
107 1.098722e-05 9.999890e-01
108 1.395204e-01 8.604796e-01
109 1.785134e-05 9.999821e-01
110 2.752661e-03 9.972473e-01
111 5.885962e-05 9.999411e-01
112 1.783427e-02 9.821657e-01
113 5.061960e-04 9.994938e-01
114 2.236694e-04 9.997763e-01
115 5.408911e-04 9.994591e-01
116 5.733275e-04 9.994267e-01
117 1.286098e-01 8.713902e-01
118 2.343744e-05 9.999766e-01
119 6.161715e-04 9.993838e-01
120 4.326539e-05 9.999567e-01
121 8.963640e-10 1.000000e+00
122 1.153872e-04 9.998846e-01
123 3.755400e-05 9.999624e-01
124 1.118491e-06 9.999989e-01
125 2.067573e-06 9.999979e-01
126 2.609586e-05 9.999739e-01
127 2.882893e-07 9.999997e-01
128 4.270992e-06 9.999957e-01
129 7.081317e-06 9.999929e-01
130 7.573557e-04 9.992426e-01
131 2.164291e-06 9.999978e-01
132 7.366522e-04 9.992633e-01
133 2.744061e-05 9.999726e-01
134 5.043407e-05 9.999496e-01
135 2.649810e-04 9.997350e-01
136 2.857448e-04 9.997143e-01
137 6.320064e-04 9.993680e-01
138 3.699555e-04 9.996300e-01
139 3.457837e-03 9.965422e-01
140 3.330767e-04 9.996669e-01
141 8.506309e-04 9.991494e-01
142 1.176206e-03 9.988238e-01
143 7.019684e-05 9.999298e-01
144 1.892658e-04 9.998107e-01
145 1.245709e-01 8.754291e-01
146 1.639240e-03 9.983608e-01
147 1.180452e-04 9.998820e-01
148 1.716352e-01 8.283648e-01
149 1.184013e-04 9.998816e-01
150 9.389177e-06 9.999906e-01
151 2.144934e-06 9.999979e-01
152 4.947553e-05 9.999505e-01
153 3.680029e-06 9.999963e-01
$x
LD1
1 -0.80961254
2 -2.11788926
3 -2.81702199
4 -1.63887851
5 -2.98560972
6 -4.51502614
7 -1.13556177
8 -3.21121050
9 -0.80427973
10 -2.43204583
11 -1.32847284
12 -0.74329959
13 -3.64339559
14 -1.33544888
15 -1.48933123
16 -1.94562057
17 -2.61481807
18 -4.67206755
19 -1.78711512
20 -0.87765265
21 -3.28415603
22 -1.14246989
23 -0.26703434
24 -0.75701601
25 -1.25711681
26 -1.20462625
27 0.08643675
28 -1.00899549
29 -1.46837770
30 -1.76239359
31 -1.09586068
32 -0.02682685
33 -1.69303805
34 -1.27539872
35 -0.75008498
36 -1.46357824
37 -2.74239885
38 -5.43229677
39 -1.52480923
40 -2.12261236
41 -1.09473729
42 -0.21715406
43 -2.66108436
44 -2.13584882
45 -0.58595351
46 -1.82866014
47 -2.25965744
48 -1.73702428
49 -1.87369947
50 -0.90119661
51 -1.00684780
52 -3.39932316
53 -2.18056581
54 -2.77699973
55 -1.51480516
56 -1.05076180
57 0.09646157
58 -0.37692329
59 -1.21505159
60 -0.89743756
61 -2.52772964
62 -0.31313946
63 -3.76173716
64 -1.31507689
65 -3.41815064
66 -2.03047848
67 -0.34313573
68 -1.84740464
69 -3.68728863
70 -3.02638958
71 -3.61896275
72 -2.58426857
73 -3.17440750
74 -2.13285173
75 -1.56450450
76 -3.61601207
77 -1.83962616
78 -2.50263849
79 -2.12266797
80 -0.92219130
81 -1.72690842
82 -2.42941777
83 -3.77688946
84 -2.60008350
85 -1.65842406
86 2.42092801
87 1.95723774
88 3.25481432
89 2.07811374
90 1.41122969
91 2.56834629
92 3.08526967
93 1.48666816
94 2.98328621
95 3.15558707
96 2.71653535
97 1.24583994
98 1.50343993
99 2.93816864
100 3.03474174
101 2.17351447
102 1.50909462
103 2.20699501
104 2.25150568
105 1.92465176
106 2.94792340
107 2.96453633
108 0.71562570
109 2.85083072
110 1.66985935
111 2.57131425
112 1.22853563
113 2.06710595
114 2.25851660
115 2.05156686
116 2.03791535
117 0.73765397
118 2.78704689
119 2.02102158
120 2.64342798
121 5.16997019
122 2.41360218
123 2.67659624
124 3.49979670
125 3.35585968
126 2.76187545
127 3.81741879
128 3.18590023
129 3.06744739
130 1.97265595
131 3.34514926
132 1.97915422
133 2.75010347
134 2.60750886
135 2.21880002
136 2.20112130
137 2.01507335
138 2.14059243
139 1.61626150
140 2.16520218
141 1.94542427
142 1.86942644
143 2.53004471
144 2.29765229
145 0.74621248
146 1.79155331
147 2.40826630
148 0.65818257
149 2.40756044
150 3.00135867
151 3.34725397
152 2.61200449
153 3.22078969
> #Wilk's Lambda FYI:(Sqrt(1- Wilks? lamda)=canonical correlation)
> X<-as.matrix(workWRMA[-4])
> Y<-workWRMA$WRMA_SEX
> workWRMA.manova<-manova(X~Y)
> workWRMA.wilks<-summary(workWRMA.manova,test="Wilks")
> workWRMA.wilks
Df Wilks approx F num Df den Df Pr(>F)
Y 1 0.18 226.40 3 149 < 2.2e-16 ***
Residuals 151
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
> #Group Centroids
>
sum(LD1*(workWRMA$WRMA_SEX=="F"))/sum(workWRMA$WRMA_SEX=="F")
[1] -1.897114
>
sum(LD1*(workWRMA$WRMA_SEX=="M"))/sum(workWRMA$WRMA_SEX=="M")
[1] 2.371392
> #Eigenvalue/Canonical Correlation
> model$svd
[1] 26.23579
--
View this message in context:
http://www.nabble.com/Animal-Morphology%3A-Deriving-Classification-Equation-with-Linear-Discriminat-Analysis-%28lda%29-tp23693355p23693355.html
Sent from the R help mailing list archive at Nabble.com.
(Ted Harding)
2009-May-24 19:07 UTC
[R] Animal Morphology: Deriving Classification Equation with
[Your data and output listings removed. For comments, see at end] On 24-May-09 13:01:26, cdm wrote:> Fellow R Users: > I'm not extremely familiar with lda or R programming, but a recent > editorial review of a manuscript submission has prompted a crash > course. I am on this forum hoping I could solicit some much needed > advice for deriving a classification equation. > > I have used three basic measurements in lda to predict two groups: > male and female. I have a working model, low Wilk's lambda, graphs, > coefficients, eigenvalues, etc. (see below). I adjusted the sample > analysis for Fisher's or Anderson's Iris data provided in the MASS > library for my own data. > > My final and last step is simply form the classification equation. > The classification equation is simply using standardized coefficients > to classify each group- in this case male or female. A more thorough > explanation is provided: > > "For cases with an equal sample size for each group the classification > function coefficient (Cj) is expressed by the following equation: > > Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp > > where Cj is the score for the jth group, j = 1 ??? k, cjo is the > constant for the jth group, and x = raw scores of each predictor. > If W = within-group variance-covariance matrix, and M = column matrix > of means for group j, then the constant cjo= (-1/2)CjMj" (Julia > Barfield, John Poulsen, and Aaron French > http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm). > > I am unable to navigate this last step based on the R output I have. > I only have the linear discriminant coefficients for each predictor > that would be needed to complete this equation. > > Please, if anybody is familiar or able to to help please let me know. > There is a spot in the acknowledgments for you. > > All the best, > Chase MendenhallThe first thing I did was to plot your data. This indicates in the first place that a perfect discrimination can be obtained on the basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated to WG, WT, ID, SEX): d.csv("horsesLDA.csv") # names(D0) # "WRMA_WG" "WRMA_WT" "WRMA_ID" "WRMA_SEX" WG<-D0$WRMA_WG; WT<-D0$WRMA_WT; ID<-D0$WRMA_ID; SEX<-D0$WRMA_SEX ix.M<-(SEX=="M"); ix.F<-(SEX=="F") ## Plot WT vs ID (M & F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch="+",col="blue") points(ID[ix.F],WT[ix.F],pch="+",col="red") lines(ID,15.5-1.0*(ID)) and that there is a lot of possible variation in the discriminating line WT = 15.5-1.0*(ID) Also, it is apparent that the covariance between WT and ID for Females is different from the covariance between WT and ID for Males. Hence the assumption (of common covariance matrix in the two groups) for standard LDA (which you have been applying) does not hold. Given that the sexes can be perfectly discriminated within the data on the basis of the linear discriminator (WT + ID) (and others), the variable WG is in effect a close approximation to noise. However, to the extent that there was a common covariance matrix to the two groups (in all three variables WG, WT, ID), and this was well estimated from the data, then inclusion of the third variable WG could yield a slightly improved discriminator in that the probability of misclassification (a rare event for such data) could be minimised. But it would not make much difference! However, since that assumption does not hold, this analysis would not be valid. If you plot WT vs WG, a common covariance is more plausible; but there is considerable overlap for these two variables: plot(WG,WT) points(WG[ix.M],WT[ix.M],pch="+",col="blue") points(WG[ix.F],WT[ix.F],pch="+",col="red") If you plot WG vs ID, there is perhaps not much overlap, but a considerable difference in covariance between the two groups: plot(ID,WG) points(ID[ix.M],WG[ix.M],pch="+",col="blue") points(ID[ix.F],WG[ix.F],pch="+",col="red") This looks better on a log scale, however: lWG <- log(WG) ; lWT <- log(WT) ; lID <- log(ID) ## Plot log(WG) vs log(ID) (M & F) plot(lID,lWG) points(lID[ix.M],lWG[ix.M],pch="+",col="blue") points(lID[ix.F],lWG[ix.F],pch="+",col="red") and common covaroance still looks good for WG vs WT: ## Plot log(WT) vs log(WG) (M & F) plot(lWG,lWT) points(lWG[ix.M],lWT[ix.M],pch="+",col="blue") points(lWG[ix.F],lWT[ix.F],pch="+",col="red") but there is no improvement for WG vs IG: ## Plot log(WT) vs log(ID) (M & F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch="+",col="blue") points(ID[ix.F],WT[ix.F],pch="+",col="red") So there is no simple road to applying a routine LDA to your data. To take account of different covariances between the two groups, you would normally be looking at a quadratic discriminator. However, as indicated above, the fact that a linear discriminator using the variables ID & WT alone works so well would leave considerable imprecision in conclusions to be drawn from its results. Sorry this is not the straightforward answer you were hoping for (which I confess I have not sought); it is simply a reaction to what your data say. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 24-May-09 Time: 20:07:43 ------------------------------ XFMail ------------------------------