thr3ads.net - R help - [R] Could not fit correct values in discriminant analysis by bruto. [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Kuhn, Max

2007-Apr-09 13:25 UTC

[R] Could not fit correct values in discriminant analysis by bruto.

Shuji,

I suspect that bruto blows up because your data are linearly separable.
To see this (if you didn't already know), try

   library(lattice)
   splom(~x, groups = y)

and look at the first row. If you are trying to do classification, there
are a few methods that would choke on this (logistic regression) and a
few that won't (trees, svms etc). I would guess that bruto is in the
latter group.

However, if you are try to do classification, try using bruto via fda:

   > tmp <- cbind(x, factor(y))
   > 
   > fdaFit <- fda(y2~., tmp)
   > fdaFit
   Call:
   fda(formula = y2 ~ ., data = tmp)
   
   Dimension: 1 
   
   Percent Between-Group Variance Explained:
    v1 
   100 
   
   Degrees of Freedom (per dimension): 5 
   
   Training Misclassification Error: 0 ( N = 20 )
   > 
   > predict(fdaFit, type = "posterior")[1:3,]
     0 1
   2 0 1
   2 0 1
   2 0 1

Max

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ?? ??
Sent: Sunday, April 08, 2007 10:47 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Could not fit correct values in discriminant analysis by
bruto.

Dear R-users,

I would like to use "bruto" function in mda package for flexible  
discriminant analysis.
Then, I tried, for example, following approach.

 > x
             band1              band2            band3
1   -1.206780      -1.448007      -1.084431
2   -0.294938      -0.113222      -0.888895
3   -0.267303      -0.241567      -1.040979
4   -1.206780      -1.448007      -1.040979
5   -1.151518      -0.806286      -0.671630
6   -1.179146      -1.396670      -1.453775
7   -0.294938      -0.241567      -1.453775
8   -0.350200      -0.267239      -1.084431
9   -1.151518      -0.857623      -0.649901
10  1.362954      -1.396670      -2.235926
11 -0.239675       1.118883       1.457551
12 -0.294938      -1.268325      -0.497817
13 -0.294938      -0.729278      -0.106745
14 -1.123883      -0.703612      -0.150196
15  0.616905       1.144548       -0.150196
16 -0.267303      1.657930         1.044750
17  1.611637      1.041874          0.610225
18 -1.123883     -0.677941         0.262605
19 -0.239675     -0.626604        -0.128473
20  2.274797       1.118883         1.805171

 > y
[1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

 > fit <- bruto(x,y)

But, obtained fit$fitted.values are enormously high (or low) .
Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or 0).
Values of column 1 are wrong or appropriate option is needed?
I contacted the package maintainer, but the problem could not be solved.

Thanks

Shuji Kawaguchi

 > R.version
platform       i386-apple-darwin8.8.1
arch           i386
os             darwin8.8.1
system         i386, darwin8.8.1
version.string R version 2.4.0 (2006-10-03)

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

----------------------------------------------------------------------
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

Kuhn, Max

2007-Apr-09 18:23 UTC

head link

[R] Could not fit correct values in discriminant analysis by bruto.

Shuji,

My answer is "it depends". Using regression on zero/one dummy
variables
can work, but there are more direct approaches to classification that
might work better.

In the case that you showed us, the data were linearly separable by one
predictor. If the data that you posted are representative of your
typical data sets, you might want to use a very simple approach:
identify the important predictor and use an ROC curve to find a good
cutoff for it. You don't need the other data.

However, if these data represent an "easy" case, you might want to try
other approaches to classification. I usually start with either
randomForest or a bagged tree (see the randomForest and ipred packages).
If I can get good performance with those models, I'll try simpler models
that are more interpretable, like single trees (the rpart package), FDA
with method = mars (using the mda package) and other approaches. 

Max

-----Original Message-----
From: ?? ?? [mailto:kawaguchi at math.kyushu-u.ac.jp] 
Sent: Monday, April 09, 2007 1:49 PM
To: Kuhn, Max
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Could not fit correct values in discriminant analysis
by bruto.

Dear Max,

Thank you very much ! Your sample code is very helpful.

In linear separable problem, I should use fda by linear regression
instead of bruto unless taking some dimensional reduction process,  
should I?

Cheers.

Shuji


On 2007/04/09, at 22:25, Kuhn, Max wrote:
> Shuji,
>
> I suspect that bruto blows up because your data are linearly  
> separable.
> To see this (if you didn't already know), try
>
>    library(lattice)
>    splom(~x, groups = y)
>
> and look at the first row. If you are trying to do classification,  
> there
> are a few methods that would choke on this (logistic regression) and a
> few that won't (trees, svms etc). I would guess that bruto is in the
> latter group.
>
> However, if you are try to do classification, try using bruto via fda:
>
>> tmp <- cbind(x, factor(y))
>>
>> fdaFit <- fda(y2~., tmp)
>> fdaFit
>    Call:
>    fda(formula = y2 ~ ., data = tmp)
>
>    Dimension: 1
>
>    Percent Between-Group Variance Explained:
>     v1
>    100
>
>    Degrees of Freedom (per dimension): 5
>
>    Training Misclassification Error: 0 ( N = 20 )
>>
>> predict(fdaFit, type = "posterior")[1:3,]
>      0 1
>    2 0 1
>    2 0 1
>    2 0 1
>
> Max
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ?? ??
> Sent: Sunday, April 08, 2007 10:47 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Could not fit correct values in discriminant analysis by
> bruto.
>
> Dear R-users,
>
> I would like to use "bruto" function in mda package for flexible
> discriminant analysis.
> Then, I tried, for example, following approach.
>
>> x
>              band1              band2            band3
> 1   -1.206780      -1.448007      -1.084431
> 2   -0.294938      -0.113222      -0.888895
> 3   -0.267303      -0.241567      -1.040979
> 4   -1.206780      -1.448007      -1.040979
> 5   -1.151518      -0.806286      -0.671630
> 6   -1.179146      -1.396670      -1.453775
> 7   -0.294938      -0.241567      -1.453775
> 8   -0.350200      -0.267239      -1.084431
> 9   -1.151518      -0.857623      -0.649901
> 10  1.362954      -1.396670      -2.235926
> 11 -0.239675       1.118883       1.457551
> 12 -0.294938      -1.268325      -0.497817
> 13 -0.294938      -0.729278      -0.106745
> 14 -1.123883      -0.703612      -0.150196
> 15  0.616905       1.144548       -0.150196
> 16 -0.267303      1.657930         1.044750
> 17  1.611637      1.041874          0.610225
> 18 -1.123883     -0.677941         0.262605
> 19 -0.239675     -0.626604        -0.128473
> 20  2.274797       1.118883         1.805171
>
>> y
> [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
>
>> fit <- bruto(x,y)
>
> But, obtained fit$fitted.values are enormously high (or low) .
> Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or  
> 0).
> Values of column 1 are wrong or appropriate option is needed?
> I contacted the package maintainer, but the problem could not be  
> solved.
>
> Thanks
>
> Shuji Kawaguchi
>
>> R.version
> platform       i386-apple-darwin8.8.1
> arch           i386
> os             darwin8.8.1
> system         i386, darwin8.8.1
> version.string R version 2.4.0 (2006-10-03)
----------------------------------------------------------------------
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Apr 2007 - Could not fit correct values in discriminant analysis by bruto.

[R] Could not fit correct values in discriminant analysis by bruto.

[R] Could not fit correct values in discriminant analysis by bruto.

Possibly Parallel Threads