cdm
2009-May-24 13:01 UTC
[R] Animal Morphology: Deriving Classification Equation with Linear Discriminat Analysis (lda)
Fellow R Users: I'm not extremely familiar with lda or R programming, but a recent editorial review of a manuscript submission has prompted a crash cousre. I am on this forum hoping I could solicit some much needed advice for deriving a classification equation. I have used three basic measurements in lda to predict two groups: male and female. I have a working model, low Wilk's lambda, graphs, coefficients, eigenvalues, etc. (see below). I adjusted the sample analysis for Fisher's or Anderson's Iris data provided in the MASS library for my own data. My final and last step is simply form the classification equation. The classification equation is simply using standardized coefficients to classify each group- in this case male or female. A more thorough explanation is provided: "For cases with an equal sample size for each group the classification function coefficient (Cj) is expressed by the following equation: Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp where Cj is the score for the jth group, j = 1 ? k, cjo is the constant for the jth group, and x = raw scores of each predictor. If W = within-group variance-covariance matrix, and M = column matrix of means for group j, then the constant cjo= (-1/2)CjMj" (Julia Barfield, John Poulsen, and Aaron French http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm). I am unable to navigate this last step based on the R output I have. I only have the linear discriminant coefficients for each predictor that would be needed to complete this equation. Please, if anybody is familiar or able to to help please let me know. There is a spot in the acknowledgments for you. All the best, Chase Mendenhall Below is the R Commander Windows http://www.nabble.com/file/p23693355/LDA-WRMA.csv LDA-WRMA.csv : Script Window: #Dataset workWRMA = read.csv("C:\\Users\\Chase\\Documents\\Interpubic Distance\\LDA\\LDA-WRMA.csv") workWRMA #Linear Discriminant Function model<-lda(WRMA_SEX~WRMA_WG+WRMA_WT+WRMA_ID,data=workWRMA) model plot(model) predict(model) #Wilk's Lambda FYI:(Sqrt(1- Wilks? lamda)=canonical correlation) X<-as.matrix(workWRMA[-4]) Y<-workWRMA$WRMA_SEX workWRMA.manova<-manova(X~Y) workWRMA.wilks<-summary(workWRMA.manova,test="Wilks") workWRMA.wilks #Group Centroids sum(LD1*(workWRMA$WRMA_SEX=="F"))/sum(workWRMA$WRMA_SEX=="F") sum(LD1*(workWRMA$WRMA_SEX=="M"))/sum(workWRMA$WRMA_SEX=="M") #Eigenvalue/Canonical Correlation model$svd Output Window:> #Dataset> workWRMA = read.csv("C:\\Users\\Chase\\Documents\\Interpubic > Distance\\LDA\\LDA-WRMA.csv")> workWRMAWRMA_WG WRMA_WT WRMA_ID WRMA_SEX 1 57.50000 11.980000 5.300000 F 2 60.50000 12.250000 8.100000 F 3 59.10000 13.280000 6.650000 F 4 61.00000 12.200000 7.300000 F 5 59.42857 13.042857 7.700000 F 6 59.20000 13.340000 10.200000 F 7 60.20000 12.600000 5.000000 F 8 61.00000 13.250000 8.000000 F 9 59.66667 12.160000 5.300000 F 10 59.00000 12.425000 8.000000 F 11 59.71429 12.333333 6.000000 F 12 60.16667 12.500000 4.400000 F 13 60.20000 13.700000 7.600000 F 14 61.00000 12.100000 6.900000 F 15 57.88889 12.100000 6.550000 F 16 58.40000 12.740000 6.000000 F 17 60.50000 12.900000 7.500000 F 18 60.00000 13.850000 9.400000 F 19 56.50000 12.600000 5.600000 F 20 59.00000 11.700000 6.500000 F 21 60.00000 12.800000 9.100000 F 22 59.00000 12.000000 6.300000 F 23 56.00000 11.900000 4.000000 F 24 60.00000 11.800000 6.200000 F 25 60.00000 13.150000 3.800000 F 26 61.00000 12.300000 6.100000 F 27 61.25000 12.050000 4.000000 F 28 57.00000 11.950000 5.700000 F 29 59.00000 12.700000 5.200000 F 30 58.50000 13.350000 4.050000 F 31 54.00000 12.000000 5.100000 F 32 60.00000 12.000000 4.100000 F 33 58.00000 12.300000 6.500000 F 34 57.00000 12.600000 4.600000 F 35 57.00000 11.500000 6.300000 F 36 60.00000 12.200000 6.700000 F 37 60.60000 12.525000 8.766667 F 38 58.33333 13.533333 11.500000 F 39 59.25000 12.575000 5.700000 F 40 61.12500 12.733333 7.000000 F 41 60.00000 12.200000 5.900000 F 42 60.00000 11.850000 4.900000 F 43 59.25000 12.600000 8.100000 F 44 60.00000 12.300000 7.900000 F 45 61.00000 12.400000 4.500000 F 46 57.00000 12.600000 5.800000 F 47 59.00000 12.900000 6.400000 F 48 57.00000 12.000000 7.150000 F 49 55.00000 12.700000 5.200000 F 50 61.00000 12.200000 5.700000 F 51 60.00000 11.700000 7.000000 F 52 59.00000 14.400000 5.000000 F 53 60.50000 13.000000 6.300000 F 54 60.00000 12.800000 8.000000 F 55 56.50000 12.100000 6.300000 F 56 56.33333 12.800000 3.450000 F 57 59.00000 11.850000 4.000000 F 58 57.00000 12.000000 4.200000 F 59 61.50000 12.700000 5.200000 F 60 58.00000 12.600000 4.000000 F 61 57.00000 12.800000 6.800000 F 62 56.00000 11.900000 4.100000 F 63 61.00000 14.100000 7.000000 F 64 59.00000 12.300000 5.900000 F 65 60.00000 13.300000 8.100000 F 66 59.50000 12.750000 6.400000 F 67 62.00000 11.700000 6.000000 F 68 60.00000 12.600000 6.500000 F 69 60.00000 13.100000 9.200000 F 70 58.12500 12.462500 9.000000 F 71 60.16667 12.650000 10.250000 F 72 59.50000 12.866667 7.300000 F 73 58.25000 12.675000 8.800000 F 74 60.00000 12.220000 8.100000 F 75 58.77778 12.955556 4.700000 F 76 59.00000 12.800000 9.600000 F 77 60.60000 12.257143 7.500000 F 78 59.62500 13.428571 5.700000 F 79 60.00000 13.025000 6.000000 F 80 57.60000 12.025000 5.450000 F 81 60.25000 12.675000 6.100000 F 82 58.66667 12.781818 7.000000 F 83 59.42857 13.242857 8.900000 F 84 58.40000 12.980000 6.800000 F 85 60.50000 12.600000 6.200000 F 86 60.00000 10.350000 3.050000 M 87 59.00000 11.100000 1.900000 M 88 58.50000 10.083333 1.600000 M 89 60.50000 10.700000 3.000000 M 90 58.50000 11.400000 2.200000 M 91 58.60000 10.320000 2.500000 M 92 61.75000 10.580000 1.400000 M 93 60.80000 11.300000 2.800000 M 94 59.33333 10.150000 2.200000 M 95 59.60000 10.066667 2.100000 M 96 58.60000 10.466667 1.800000 M 97 59.33333 11.300000 3.000000 M 98 58.50000 11.400000 2.000000 M 99 60.20000 10.416667 1.800000 M 100 58.66667 10.050000 2.200000 M 101 62.00000 10.980000 2.400000 M 102 61.00000 11.550000 2.150000 M 103 55.00000 10.433333 2.200000 M 104 57.00000 10.450000 2.500000 M 105 59.25000 10.800000 2.800000 M 106 59.00000 10.500000 1.300000 M 107 60.00000 10.300000 2.000000 M 108 57.00000 11.450000 3.250000 M 109 56.00000 9.900000 2.400000 M 110 57.00000 10.900000 2.600000 M 111 59.83333 10.500000 2.300000 M 112 59.25000 11.075000 3.600000 M 113 57.50000 10.725000 2.300000 M 114 60.33333 10.766667 2.400000 M 115 58.00000 10.800000 2.250000 M 116 60.50000 10.850000 2.700000 M 117 60.00000 11.900000 2.700000 M 118 57.00000 10.000000 2.500000 M 119 58.00000 11.000000 1.800000 M 120 61.00000 10.500000 2.400000 M 121 59.33333 8.700000 1.200000 M 122 60.20000 10.625000 2.400000 M 123 55.00000 10.000000 2.300000 M 124 58.50000 9.800000 1.800000 M 125 59.00000 9.866667 2.050000 M 126 58.00000 10.300000 2.000000 M 127 58.50000 9.475000 1.950000 M 128 58.00000 9.750000 2.500000 M 129 58.66667 10.100000 2.000000 M 130 61.00000 10.560000 3.700000 M 131 59.00000 10.050000 1.600000 M 132 56.00000 10.400000 3.000000 M 133 59.00000 10.550000 1.600000 M 134 60.00000 10.600000 2.000000 M 135 57.00000 10.400000 2.700000 M 136 58.00000 10.500000 2.700000 M 137 57.00000 10.300000 3.400000 M 138 61.00000 11.000000 2.200000 M 139 57.00000 11.100000 2.200000 M 140 57.00000 10.600000 2.300000 M 141 59.00000 10.800000 2.700000 M 142 58.25000 10.800000 2.700000 M 143 59.50000 10.700000 1.800000 M 144 58.00000 11.000000 1.200000 M 145 58.00000 11.800000 2.500000 M 146 58.00000 10.766667 2.900000 M 147 58.40000 10.360000 2.700000 M 148 57.25000 12.275000 1.300000 M 149 59.66667 10.933333 1.500000 M 150 59.66667 10.085714 2.400000 M 151 57.08333 10.040000 1.200000 M 152 60.00000 10.480000 2.300000 M 153 59.66667 10.250000 1.500000 M> #Linear Discriminant Function> model<-lda(WRMA_SEX~WRMA_WG+WRMA_WT+WRMA_ID,data=workWRMA)> modelCall: lda(WRMA_SEX ~ WRMA_WG + WRMA_WT + WRMA_ID, data = workWRMA) Prior probabilities of groups: F M 0.5555556 0.4444444 Group means: WRMA_WG WRMA_WT WRMA_ID F 59.14702 12.57545 6.483725 M 58.76814 10.58869 2.270588 Coefficients of linear discriminants: LD1 WRMA_WG 0.1013304 WRMA_WT -1.1900916 WRMA_ID -0.4610512> plot(model)> predict(model)$class [1] F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F [38] F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F F [75] F F F F F F F F F F F M M M M M M M M M M M M M M M M M M M M M M M M M M [112] M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M M [149] M M M M M Levels: F M $posterior F M 1 9.909078e-01 9.092155e-03 2 9.999655e-01 3.446277e-05 3 9.999983e-01 1.743091e-06 4 9.997338e-01 2.662160e-04 5 9.999992e-01 8.487788e-07 6 1.000000e+00 1.240453e-09 7 9.977228e-01 2.277207e-03 8 9.999997e-01 3.240250e-07 9 9.907005e-01 9.299547e-03 10 9.999910e-01 9.015325e-06 11 9.989992e-01 1.000786e-03 12 9.879689e-01 1.203115e-02 13 9.999999e-01 5.121521e-08 14 9.990285e-01 9.714536e-04 15 9.994961e-01 5.039181e-04 16 9.999281e-01 7.189251e-05 17 9.999959e-01 4.132020e-06 18 1.000000e+00 6.345395e-10 19 9.998586e-01 1.414129e-04 20 9.931840e-01 6.816027e-03 21 9.999998e-01 2.373302e-07 22 9.977888e-01 2.211185e-03 23 9.149174e-01 8.508259e-02 24 9.886453e-01 1.135473e-02 25 9.986434e-01 1.356643e-03 26 9.983032e-01 1.696775e-03 27 7.039998e-01 2.960002e-01 28 9.960977e-01 3.902309e-03 29 9.994490e-01 5.510397e-04 30 9.998429e-01 1.571487e-04 31 9.973034e-01 2.696605e-03 32 7.941107e-01 2.058893e-01 33 9.997887e-01 2.112797e-04 34 9.987451e-01 1.254929e-03 35 9.883083e-01 1.169170e-02 36 9.994376e-01 5.624386e-04 37 9.999976e-01 2.396930e-06 38 1.000000e+00 2.472603e-11 39 9.995669e-01 4.331337e-04 40 9.999662e-01 3.377496e-05 41 9.972905e-01 2.709532e-03 42 8.968123e-01 1.031877e-01 43 9.999966e-01 3.391523e-06 44 9.999681e-01 3.191964e-05 45 9.767181e-01 2.328195e-02 46 9.998816e-01 1.184359e-04 47 9.999812e-01 1.881693e-05 48 9.998249e-01 1.751186e-04 49 9.999023e-01 9.772348e-05 50 9.938317e-01 6.168348e-03 51 9.960619e-01 3.938106e-03 52 9.999999e-01 1.451635e-07 53 9.999736e-01 2.637331e-05 54 9.999979e-01 2.067820e-06 55 9.995480e-01 4.520216e-04 56 9.967328e-01 3.267176e-03 57 6.950057e-01 3.049943e-01 58 9.450219e-01 5.497811e-02 59 9.983770e-01 1.623042e-03 60 9.937325e-01 6.267495e-03 61 9.999940e-01 5.992460e-06 62 9.290384e-01 7.096159e-02 63 1.000000e+00 3.090428e-08 64 9.989404e-01 1.059617e-03 65 9.999999e-01 1.339539e-07 66 9.999500e-01 5.004774e-05 67 9.370296e-01 6.297037e-02 68 9.998907e-01 1.093299e-04 69 1.000000e+00 4.246494e-08 70 9.999993e-01 7.131770e-07 71 9.999999e-01 5.684500e-08 72 9.999953e-01 4.707545e-06 73 9.999996e-01 3.791434e-07 74 9.999677e-01 3.233061e-05 75 9.996344e-01 3.656492e-04 76 9.999999e-01 5.756549e-08 77 9.998870e-01 1.130204e-04 78 9.999933e-01 6.669888e-06 79 9.999662e-01 3.376694e-05 80 9.943574e-01 5.642595e-03 81 9.998172e-01 1.828443e-04 82 9.999909e-01 9.117026e-06 83 1.000000e+00 2.896873e-08 84 9.999956e-01 4.400247e-06 85 9.997551e-01 2.449120e-04 86 1.118353e-04 9.998882e-01 87 8.088344e-04 9.991912e-01 88 3.182563e-06 9.999968e-01 89 4.829728e-04 9.995170e-01 90 8.256460e-03 9.917435e-01 91 5.961000e-05 9.999404e-01 92 6.562593e-06 9.999934e-01 93 5.997017e-03 9.940030e-01 94 1.014214e-05 9.999899e-01 95 4.860967e-06 9.999951e-01 96 3.166796e-05 9.999683e-01 97 1.658551e-02 9.834145e-01 98 5.585009e-03 9.944150e-01 99 1.229610e-05 9.999877e-01 100 8.142222e-06 9.999919e-01 101 3.214697e-04 9.996785e-01 102 5.452544e-03 9.945475e-01 103 2.786717e-04 9.997213e-01 104 2.304626e-04 9.997695e-01 105 9.294258e-04 9.990706e-01 106 1.179463e-05 9.999882e-01 107 1.098722e-05 9.999890e-01 108 1.395204e-01 8.604796e-01 109 1.785134e-05 9.999821e-01 110 2.752661e-03 9.972473e-01 111 5.885962e-05 9.999411e-01 112 1.783427e-02 9.821657e-01 113 5.061960e-04 9.994938e-01 114 2.236694e-04 9.997763e-01 115 5.408911e-04 9.994591e-01 116 5.733275e-04 9.994267e-01 117 1.286098e-01 8.713902e-01 118 2.343744e-05 9.999766e-01 119 6.161715e-04 9.993838e-01 120 4.326539e-05 9.999567e-01 121 8.963640e-10 1.000000e+00 122 1.153872e-04 9.998846e-01 123 3.755400e-05 9.999624e-01 124 1.118491e-06 9.999989e-01 125 2.067573e-06 9.999979e-01 126 2.609586e-05 9.999739e-01 127 2.882893e-07 9.999997e-01 128 4.270992e-06 9.999957e-01 129 7.081317e-06 9.999929e-01 130 7.573557e-04 9.992426e-01 131 2.164291e-06 9.999978e-01 132 7.366522e-04 9.992633e-01 133 2.744061e-05 9.999726e-01 134 5.043407e-05 9.999496e-01 135 2.649810e-04 9.997350e-01 136 2.857448e-04 9.997143e-01 137 6.320064e-04 9.993680e-01 138 3.699555e-04 9.996300e-01 139 3.457837e-03 9.965422e-01 140 3.330767e-04 9.996669e-01 141 8.506309e-04 9.991494e-01 142 1.176206e-03 9.988238e-01 143 7.019684e-05 9.999298e-01 144 1.892658e-04 9.998107e-01 145 1.245709e-01 8.754291e-01 146 1.639240e-03 9.983608e-01 147 1.180452e-04 9.998820e-01 148 1.716352e-01 8.283648e-01 149 1.184013e-04 9.998816e-01 150 9.389177e-06 9.999906e-01 151 2.144934e-06 9.999979e-01 152 4.947553e-05 9.999505e-01 153 3.680029e-06 9.999963e-01 $x LD1 1 -0.80961254 2 -2.11788926 3 -2.81702199 4 -1.63887851 5 -2.98560972 6 -4.51502614 7 -1.13556177 8 -3.21121050 9 -0.80427973 10 -2.43204583 11 -1.32847284 12 -0.74329959 13 -3.64339559 14 -1.33544888 15 -1.48933123 16 -1.94562057 17 -2.61481807 18 -4.67206755 19 -1.78711512 20 -0.87765265 21 -3.28415603 22 -1.14246989 23 -0.26703434 24 -0.75701601 25 -1.25711681 26 -1.20462625 27 0.08643675 28 -1.00899549 29 -1.46837770 30 -1.76239359 31 -1.09586068 32 -0.02682685 33 -1.69303805 34 -1.27539872 35 -0.75008498 36 -1.46357824 37 -2.74239885 38 -5.43229677 39 -1.52480923 40 -2.12261236 41 -1.09473729 42 -0.21715406 43 -2.66108436 44 -2.13584882 45 -0.58595351 46 -1.82866014 47 -2.25965744 48 -1.73702428 49 -1.87369947 50 -0.90119661 51 -1.00684780 52 -3.39932316 53 -2.18056581 54 -2.77699973 55 -1.51480516 56 -1.05076180 57 0.09646157 58 -0.37692329 59 -1.21505159 60 -0.89743756 61 -2.52772964 62 -0.31313946 63 -3.76173716 64 -1.31507689 65 -3.41815064 66 -2.03047848 67 -0.34313573 68 -1.84740464 69 -3.68728863 70 -3.02638958 71 -3.61896275 72 -2.58426857 73 -3.17440750 74 -2.13285173 75 -1.56450450 76 -3.61601207 77 -1.83962616 78 -2.50263849 79 -2.12266797 80 -0.92219130 81 -1.72690842 82 -2.42941777 83 -3.77688946 84 -2.60008350 85 -1.65842406 86 2.42092801 87 1.95723774 88 3.25481432 89 2.07811374 90 1.41122969 91 2.56834629 92 3.08526967 93 1.48666816 94 2.98328621 95 3.15558707 96 2.71653535 97 1.24583994 98 1.50343993 99 2.93816864 100 3.03474174 101 2.17351447 102 1.50909462 103 2.20699501 104 2.25150568 105 1.92465176 106 2.94792340 107 2.96453633 108 0.71562570 109 2.85083072 110 1.66985935 111 2.57131425 112 1.22853563 113 2.06710595 114 2.25851660 115 2.05156686 116 2.03791535 117 0.73765397 118 2.78704689 119 2.02102158 120 2.64342798 121 5.16997019 122 2.41360218 123 2.67659624 124 3.49979670 125 3.35585968 126 2.76187545 127 3.81741879 128 3.18590023 129 3.06744739 130 1.97265595 131 3.34514926 132 1.97915422 133 2.75010347 134 2.60750886 135 2.21880002 136 2.20112130 137 2.01507335 138 2.14059243 139 1.61626150 140 2.16520218 141 1.94542427 142 1.86942644 143 2.53004471 144 2.29765229 145 0.74621248 146 1.79155331 147 2.40826630 148 0.65818257 149 2.40756044 150 3.00135867 151 3.34725397 152 2.61200449 153 3.22078969> #Wilk's Lambda FYI:(Sqrt(1- Wilks? lamda)=canonical correlation)> X<-as.matrix(workWRMA[-4])> Y<-workWRMA$WRMA_SEX> workWRMA.manova<-manova(X~Y)> workWRMA.wilks<-summary(workWRMA.manova,test="Wilks")> workWRMA.wilksDf Wilks approx F num Df den Df Pr(>F) Y 1 0.18 226.40 3 149 < 2.2e-16 *** Residuals 151 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1> #Group Centroids> sum(LD1*(workWRMA$WRMA_SEX=="F"))/sum(workWRMA$WRMA_SEX=="F")[1] -1.897114> sum(LD1*(workWRMA$WRMA_SEX=="M"))/sum(workWRMA$WRMA_SEX=="M")[1] 2.371392> #Eigenvalue/Canonical Correlation> model$svd[1] 26.23579 -- View this message in context: http://www.nabble.com/Animal-Morphology%3A-Deriving-Classification-Equation-with-Linear-Discriminat-Analysis-%28lda%29-tp23693355p23693355.html Sent from the R help mailing list archive at Nabble.com.
(Ted Harding)
2009-May-24 19:07 UTC
[R] Animal Morphology: Deriving Classification Equation with
[Your data and output listings removed. For comments, see at end] On 24-May-09 13:01:26, cdm wrote:> Fellow R Users: > I'm not extremely familiar with lda or R programming, but a recent > editorial review of a manuscript submission has prompted a crash > course. I am on this forum hoping I could solicit some much needed > advice for deriving a classification equation. > > I have used three basic measurements in lda to predict two groups: > male and female. I have a working model, low Wilk's lambda, graphs, > coefficients, eigenvalues, etc. (see below). I adjusted the sample > analysis for Fisher's or Anderson's Iris data provided in the MASS > library for my own data. > > My final and last step is simply form the classification equation. > The classification equation is simply using standardized coefficients > to classify each group- in this case male or female. A more thorough > explanation is provided: > > "For cases with an equal sample size for each group the classification > function coefficient (Cj) is expressed by the following equation: > > Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp > > where Cj is the score for the jth group, j = 1 ??? k, cjo is the > constant for the jth group, and x = raw scores of each predictor. > If W = within-group variance-covariance matrix, and M = column matrix > of means for group j, then the constant cjo= (-1/2)CjMj" (Julia > Barfield, John Poulsen, and Aaron French > http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm). > > I am unable to navigate this last step based on the R output I have. > I only have the linear discriminant coefficients for each predictor > that would be needed to complete this equation. > > Please, if anybody is familiar or able to to help please let me know. > There is a spot in the acknowledgments for you. > > All the best, > Chase MendenhallThe first thing I did was to plot your data. This indicates in the first place that a perfect discrimination can be obtained on the basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated to WG, WT, ID, SEX): d.csv("horsesLDA.csv") # names(D0) # "WRMA_WG" "WRMA_WT" "WRMA_ID" "WRMA_SEX" WG<-D0$WRMA_WG; WT<-D0$WRMA_WT; ID<-D0$WRMA_ID; SEX<-D0$WRMA_SEX ix.M<-(SEX=="M"); ix.F<-(SEX=="F") ## Plot WT vs ID (M & F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch="+",col="blue") points(ID[ix.F],WT[ix.F],pch="+",col="red") lines(ID,15.5-1.0*(ID)) and that there is a lot of possible variation in the discriminating line WT = 15.5-1.0*(ID) Also, it is apparent that the covariance between WT and ID for Females is different from the covariance between WT and ID for Males. Hence the assumption (of common covariance matrix in the two groups) for standard LDA (which you have been applying) does not hold. Given that the sexes can be perfectly discriminated within the data on the basis of the linear discriminator (WT + ID) (and others), the variable WG is in effect a close approximation to noise. However, to the extent that there was a common covariance matrix to the two groups (in all three variables WG, WT, ID), and this was well estimated from the data, then inclusion of the third variable WG could yield a slightly improved discriminator in that the probability of misclassification (a rare event for such data) could be minimised. But it would not make much difference! However, since that assumption does not hold, this analysis would not be valid. If you plot WT vs WG, a common covariance is more plausible; but there is considerable overlap for these two variables: plot(WG,WT) points(WG[ix.M],WT[ix.M],pch="+",col="blue") points(WG[ix.F],WT[ix.F],pch="+",col="red") If you plot WG vs ID, there is perhaps not much overlap, but a considerable difference in covariance between the two groups: plot(ID,WG) points(ID[ix.M],WG[ix.M],pch="+",col="blue") points(ID[ix.F],WG[ix.F],pch="+",col="red") This looks better on a log scale, however: lWG <- log(WG) ; lWT <- log(WT) ; lID <- log(ID) ## Plot log(WG) vs log(ID) (M & F) plot(lID,lWG) points(lID[ix.M],lWG[ix.M],pch="+",col="blue") points(lID[ix.F],lWG[ix.F],pch="+",col="red") and common covaroance still looks good for WG vs WT: ## Plot log(WT) vs log(WG) (M & F) plot(lWG,lWT) points(lWG[ix.M],lWT[ix.M],pch="+",col="blue") points(lWG[ix.F],lWT[ix.F],pch="+",col="red") but there is no improvement for WG vs IG: ## Plot log(WT) vs log(ID) (M & F) plot(ID,WT,xlim=c(0,12),ylim=c(8,15)) points(ID[ix.M],WT[ix.M],pch="+",col="blue") points(ID[ix.F],WT[ix.F],pch="+",col="red") So there is no simple road to applying a routine LDA to your data. To take account of different covariances between the two groups, you would normally be looking at a quadratic discriminator. However, as indicated above, the fact that a linear discriminator using the variables ID & WT alone works so well would leave considerable imprecision in conclusions to be drawn from its results. Sorry this is not the straightforward answer you were hoping for (which I confess I have not sought); it is simply a reaction to what your data say. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 24-May-09 Time: 20:07:43 ------------------------------ XFMail ------------------------------