Dear All This is more of a statistics question than a question about help for R, so forgive me. I am using lda from the MASS package to perform linear discriminant function analysis. I have 14 cases belonging to two groups and have measured each of 37 variables. I want to find those variables that best discriminate between the two groups, and I want to visualise that and create a classification function. Please note at this stage it is a proof of concept problem - I realise that I must follow this up with a much more robust anaylsis involving cross-validation. 1) First problem, I got this error message:> z <- lda(C0GRP_NA ~ ., dpi30)Warning message: variables are collinear in: lda.default(x, grouping, ...) I guess this is not a good thing, however, I *did* get a result and it discriminated perfectly between my groups. Can anyone explain what this means? Does it invalidate my results? 2) My analysis came up with one discriminant variable. How do I control how many are produced? I currently assume this is the only significant discriminant variable found. Can I insist it finds more? 3) More of a tip - when my analysis only finds one significant variable, what is a good way to visualise this graphically? 4) Can I work out from the coefficients which sub groups of my variable are better at discriminating than others? I guess I could simply perform a t-test first to select the best variables...? 5) How do I turn my discriminant function into a classification function? i.e. when I plot the scores for the groups I can see graphically that all the values for one group are below 0.1 and all the values for the other group are above 1. But how do I turn my discriminant function into a classification function? Many thanks in advance for your help Mick
michael watson (IAH-C) wrote:> Dear All > > This is more of a statistics question than a question about help for R, > so forgive me. > > I am using lda from the MASS package to perform linear discriminant > function analysis. I have 14 cases belonging to two groups and have > measured each of 37 variables. I want to find those variables that best > discriminate between the two groups, and I want to visualise that and > create a classification function. Please note at this stage it is a > proof of concept problem - I realise that I must follow this up with a > much more robust anaylsis involving cross-validation. > > 1) First problem, I got this error message: > >>z <- lda(C0GRP_NA ~ ., dpi30) > > Warning message: > variables are collinear in: lda.default(x, grouping, ...) > > I guess this is not a good thing, however, I *did* get a result and it > discriminated perfectly between my groups. Can anyone explain what this > means? Does it invalidate my results?Well, 14 cases and 37 variables mean that not that many degrees of freedom are left.... ;-) Of course, you get a perfect fit - with arbitrary data.> > 2) My analysis came up with one discriminant variable. How do I control > how many are produced? I currently assume this is the only significant > discriminant variable found. Can I insist it finds more?Well, if projection into one dimension is already perfect, it's hard to find a second one that improves the result...> 3) More of a tip - when my analysis only finds one significant variable, > what is a good way to visualise this graphically?Depends of the amount of data, either all data on one line, maybe jittered, or maybe even beter two boxplot, given there would be really perfect (and sensible) separation ....> 4) Can I work out from the coefficients which sub groups of my variable > are better at discriminating than others? I guess I could simply > perform a t-test first to select the best variables...?No, because you ignore possible projections in this case.> 5) How do I turn my discriminant function into a classification > function? i.e. when I plot the scores for the groups I can see > graphically that all the values for one group are below 0.1 and all the > values for the other group are above 1. But how do I turn my > discriminant function into a classification function?What about looking for the point where it has the value 0.5 for the posterior? Uwe LIgges> Many thanks in advance for your help > > Mick > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thanks for the answers Uwe! So this is a common problem in biology - few number of cases and many, many variables (genes, proteins, metabolites, etc etc)! Under these conditions, is discriminant function analysis not an ideal method to use then? Are there alternatives?> 1) First problem, I got this error message: > >>z <- lda(C0GRP_NA ~ ., dpi30) > > Warning message: > variables are collinear in: lda.default(x, grouping, ...) > > I guess this is not a good thing, however, I *did* get a result and it> discriminated perfectly between my groups. Can anyone explain what > this means? Does it invalidate my results?Well, 14 cases and 37 variables mean that not that many degrees of freedom are left.... ;-) Of course, you get a perfect fit - with arbitrary data.
michael watson (IAH-C) wrote:> Thanks for the answers Uwe! > > So this is a common problem in biology - few number of cases and many, > many variables (genes, proteins, metabolites, etc etc)! > > Under these conditions, is discriminant function analysis not an ideal > method to use then? Are there alternatives?No, obviously not "an ideal method", if used as is on the whole data. Alternatives are certainly described in the literature - I am not specialised in this field (I mean, this gene stuff), hence do not want to specify misleading references here. Uwe Ligges> >>1) First problem, I got this error message: >> >> >>>z <- lda(C0GRP_NA ~ ., dpi30) >> >>Warning message: >>variables are collinear in: lda.default(x, grouping, ...) >> >>I guess this is not a good thing, however, I *did* get a result and it > > >>discriminated perfectly between my groups. Can anyone explain what >>this means? Does it invalidate my results? > > > Well, 14 cases and 37 variables mean that not that many degrees of > freedom are left.... ;-) > Of course, you get a perfect fit - with arbitrary data.
Hi, I need to emulate the result I in SPSS for discriminant analysis. Specifically, Canonical discriminant function coefficients and most importantly classification results. | -------- | ----- | ---------------------- | -------------------------------------------------------- | ---------------------- | | | | job | Predicted Group Membership | Total | | | | | ---------------------- | -------------- | -------------- | ---------------------- | | | | | 1.00 customer service | 2.00 mechanic | 3.00 dispatch | 1.00 customer service | | -------- | ----- | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | Original | Count | 1.00 customer service | 70 | 11 | 4 | 85 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 2.00 mechanic | 16 | 62 | 15 | 93 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 3.00 dispatch | 3 | 12 | 51 | 66 | | | ----- | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | % | 1.00 customer service | 82.4 | 12.9 | 4.7 | 100.0 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 2.00 mechanic | 17.2 | 66.7 | 16.1 | 100.0 | | | | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | | | | 3.00 dispatch | 4.5 | 18.2 | 77.3 | 100.0 | | -------- | ----- | ---------------------- | ---------------------- | -------------- | -------------- | ---------------------- | a 75.0% of original grouped cases correctly classified. Something like the table above. I am not sure how the table will turn out. It basically has the original group and the predicted group and based on that, the % correctly classified group. Thank you