thr3ads.net - R help - [R] Discriminant Function Analysis [Jul 2005]

If this information is useful, please help other people find it:
Share via:

michael watson (IAH-C)

2005-Jul-05 14:29 UTC

[R] Discriminant Function Analysis

Dear All

This is more of a statistics question than a question about help for R,
so forgive me.

I am using lda from the MASS package to perform linear discriminant
function analysis.  I have 14 cases belonging to two groups and have
measured each of 37 variables.  I want to find those variables that best
discriminate between the two groups, and I want to visualise that and
create a classification function.  Please note at this stage it is a
proof of concept problem - I realise that I must follow this up with a
much more robust anaylsis involving cross-validation.

1) First problem, I got this error message:> z <- lda(C0GRP_NA ~ ., dpi30)Warning message: 
variables are collinear in: lda.default(x, grouping, ...) 

I guess this is not a good thing, however, I *did* get a result and it
discriminated perfectly between my groups.  Can anyone explain what this
means?  Does it invalidate my results?

2) My analysis came up with one discriminant variable.  How do I control
how many are produced?  I currently assume this is the only significant
discriminant variable found.  Can I insist it finds more?

3) More of a tip - when my analysis only finds one significant variable,
what is a good way to visualise this graphically?

4) Can I work out from the coefficients which sub groups of my variable
are better at discriminating than others?  I guess I could simply
perform a t-test first to select the best variables...?

5) How do I turn my discriminant function into a classification
function?  i.e. when I plot the scores for the groups I can see
graphically that all the values for one group are below 0.1 and all the
values for the other group are above 1.  But how do I turn my
discriminant function into a classification function?

Many thanks in advance for your help

Mick

Uwe Ligges

2005-Jul-05 18:42 UTC

head link

[R] Discriminant Function Analysis

michael watson (IAH-C) wrote:
> Dear All
> 
> This is more of a statistics question than a question about help for R,
> so forgive me.
> 
> I am using lda from the MASS package to perform linear discriminant
> function analysis.  I have 14 cases belonging to two groups and have
> measured each of 37 variables.  I want to find those variables that best
> discriminate between the two groups, and I want to visualise that and
> create a classification function.  Please note at this stage it is a
> proof of concept problem - I realise that I must follow this up with a
> much more robust anaylsis involving cross-validation.
> 
> 1) First problem, I got this error message:
> 
>>z <- lda(C0GRP_NA ~ ., dpi30)
> 
> Warning message: 
> variables are collinear in: lda.default(x, grouping, ...) 
> 
> I guess this is not a good thing, however, I *did* get a result and it
> discriminated perfectly between my groups.  Can anyone explain what this
> means?  Does it invalidate my results?
Well, 14 cases and 37 variables mean that not that many degrees of 
freedom are left.... ;-)
Of course, you get a perfect fit - with arbitrary data.
> 
> 2) My analysis came up with one discriminant variable.  How do I control
> how many are produced?  I currently assume this is the only significant
> discriminant variable found.  Can I insist it finds more?
Well, if projection into one dimension is already perfect, it's hard to 
find a second one that improves the result...

> 3) More of a tip - when my analysis only finds one significant variable,
> what is a good way to visualise this graphically?
Depends of the amount of data, either all data on one line, maybe 
jittered, or maybe even beter two boxplot, given there would be really 
perfect (and sensible) separation ....

> 4) Can I work out from the coefficients which sub groups of my variable
> are better at discriminating than others?  I guess I could simply
> perform a t-test first to select the best variables...?
No, because you ignore possible projections in this case.

> 5) How do I turn my discriminant function into a classification
> function?  i.e. when I plot the scores for the groups I can see
> graphically that all the values for one group are below 0.1 and all the
> values for the other group are above 1.  But how do I turn my
> discriminant function into a classification function?
What about looking for the point where it has the value 0.5 for the 
posterior?

Uwe LIgges


> Many thanks in advance for your help
> 
> Mick
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

michael watson (IAH-C)

2005-Jul-07 08:58 UTC

head link

[R] Discriminant Function Analysis

Thanks for the answers Uwe!

So this is a common problem in biology - few number of cases and many,
many variables (genes, proteins, metabolites, etc etc)! 

Under these conditions, is discriminant function analysis not an ideal
method to use then?  Are there alternatives?
> 1) First problem, I got this error message:
> 
>>z <- lda(C0GRP_NA ~ ., dpi30)
> 
> Warning message:
> variables are collinear in: lda.default(x, grouping, ...) 
> 
> I guess this is not a good thing, however, I *did* get a result and it
> discriminated perfectly between my groups.  Can anyone explain what 
> this means?  Does it invalidate my results?
Well, 14 cases and 37 variables mean that not that many degrees of 
freedom are left.... ;-)
Of course, you get a perfect fit - with arbitrary data.

Uwe Ligges

2005-Jul-07 09:05 UTC

head link

[R] Discriminant Function Analysis

michael watson (IAH-C) wrote:
> Thanks for the answers Uwe!
> 
> So this is a common problem in biology - few number of cases and many,
> many variables (genes, proteins, metabolites, etc etc)! 
> 
> Under these conditions, is discriminant function analysis not an ideal
> method to use then?  Are there alternatives?
No, obviously not "an ideal method", if used as is on the whole data.

Alternatives are certainly described in the literature - I am not 
specialised in this field (I mean, this gene stuff), hence do not want 
to specify misleading references here.

Uwe Ligges

> 
>>1) First problem, I got this error message:
>>
>>
>>>z <- lda(C0GRP_NA ~ ., dpi30)
>>
>>Warning message:
>>variables are collinear in: lda.default(x, grouping, ...) 
>>
>>I guess this is not a good thing, however, I *did* get a result and it
> 
> 
>>discriminated perfectly between my groups.  Can anyone explain what 
>>this means?  Does it invalidate my results?
> 
> 
> Well, 14 cases and 37 variables mean that not that many degrees of 
> freedom are left.... ;-)
> Of course, you get a perfect fit - with arbitrary data.

Senthil Nambi

2008-Nov-15 00:13 UTC

head link

[R] Discriminant Function Analysis

Hi,

I need to emulate the result I in SPSS for discriminant analysis.

Specifically, Canonical discriminant function coefficients and most
importantly classification results.

| -------- | ----- | ---------------------- |
-------------------------------------------------------- |
---------------------- | 
 |          |       | job                    | Predicted Group Membership       
| Total                  |
 |          |       |                        | ---------------------- |
-------------- | -------------- | ---------------------- |
 |          |       |                        | 1.00  customer service |         
2.00  mechanic | 3.00  dispatch | 1.00  customer service |
 | -------- | ----- | ---------------------- | ---------------------- |
-------------- | -------------- | ---------------------- |
 | Original | Count | 1.00       customer service  |       70                   
| 11             | 4              | 85                     |
 |          |       | ---------------------- | ---------------------- |
-------------- | -------------- | ---------------------- |
 |          |       | 2.00             mechanic                | 16             
| 62             | 15             | 93                     |
 |          |       | ---------------------- | ---------------------- |
-------------- | -------------- | ---------------------- |
 |          |       | 3.00             dispatch                  | 3            
| 12             | 51             | 66                     |
 |          | ----- | ---------------------- | ---------------------- |
--------------  | -------------- | ---------------------- |
 |          | %     | 1.00            customer service     | 82.4               
| 12.9           | 4.7            | 100.0                  |
 |          |       | ---------------------- | ---------------------- |
-------------- | -------------- | ---------------------- |
 |          |       | 2.00             mechanic                | 17.2           
| 66.7           | 16.1           | 100.0                  |
 |          |       | ---------------------- | ---------------------- |
-------------- | -------------- | ---------------------- |
 |          |       | 3.00             dispatch                  | 4.5          
| 18.2           | 77.3           | 100.0                  |
 | -------- | ----- | ---------------------- | ---------------------- |
-------------- | -------------- | ---------------------- |
a 75.0% of original grouped cases correctly classified.

Something like the table above.

I am not sure how the table will turn out. It basically has the original group
and the predicted group and based on that, the % correctly classified group.

Thank you

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Jul 2005 - Discriminant Function Analysis

[R] Discriminant Function Analysis

[R] Discriminant Function Analysis

[R] Discriminant Function Analysis

[R] Discriminant Function Analysis

[R] Discriminant Function Analysis

Possibly Parallel Threads