Kuhn, Max
2007-Apr-09 13:25 UTC
[R] Could not fit correct values in discriminant analysis by bruto.
Shuji,
I suspect that bruto blows up because your data are linearly separable.
To see this (if you didn't already know), try
library(lattice)
splom(~x, groups = y)
and look at the first row. If you are trying to do classification, there
are a few methods that would choke on this (logistic regression) and a
few that won't (trees, svms etc). I would guess that bruto is in the
latter group.
However, if you are try to do classification, try using bruto via fda:
> tmp <- cbind(x, factor(y))
>
> fdaFit <- fda(y2~., tmp)
> fdaFit
Call:
fda(formula = y2 ~ ., data = tmp)
Dimension: 1
Percent Between-Group Variance Explained:
v1
100
Degrees of Freedom (per dimension): 5
Training Misclassification Error: 0 ( N = 20 )
>
> predict(fdaFit, type = "posterior")[1:3,]
0 1
2 0 1
2 0 1
2 0 1
Max
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ?? ??
Sent: Sunday, April 08, 2007 10:47 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Could not fit correct values in discriminant analysis by
bruto.
Dear R-users,
I would like to use "bruto" function in mda package for flexible
discriminant analysis.
Then, I tried, for example, following approach.
> x
band1 band2 band3
1 -1.206780 -1.448007 -1.084431
2 -0.294938 -0.113222 -0.888895
3 -0.267303 -0.241567 -1.040979
4 -1.206780 -1.448007 -1.040979
5 -1.151518 -0.806286 -0.671630
6 -1.179146 -1.396670 -1.453775
7 -0.294938 -0.241567 -1.453775
8 -0.350200 -0.267239 -1.084431
9 -1.151518 -0.857623 -0.649901
10 1.362954 -1.396670 -2.235926
11 -0.239675 1.118883 1.457551
12 -0.294938 -1.268325 -0.497817
13 -0.294938 -0.729278 -0.106745
14 -1.123883 -0.703612 -0.150196
15 0.616905 1.144548 -0.150196
16 -0.267303 1.657930 1.044750
17 1.611637 1.041874 0.610225
18 -1.123883 -0.677941 0.262605
19 -0.239675 -0.626604 -0.128473
20 2.274797 1.118883 1.805171
> y
[1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
> fit <- bruto(x,y)
But, obtained fit$fitted.values are enormously high (or low) .
Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or 0).
Values of column 1 are wrong or appropriate option is needed?
I contacted the package maintainer, but the problem could not be solved.
Thanks
Shuji Kawaguchi
> R.version
platform i386-apple-darwin8.8.1
arch i386
os darwin8.8.1
system i386, darwin8.8.1
version.string R version 2.4.0 (2006-10-03)
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
----------------------------------------------------------------------
LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
Kuhn, Max
2007-Apr-09 18:23 UTC
[R] Could not fit correct values in discriminant analysis by bruto.
Shuji, My answer is "it depends". Using regression on zero/one dummy variables can work, but there are more direct approaches to classification that might work better. In the case that you showed us, the data were linearly separable by one predictor. If the data that you posted are representative of your typical data sets, you might want to use a very simple approach: identify the important predictor and use an ROC curve to find a good cutoff for it. You don't need the other data. However, if these data represent an "easy" case, you might want to try other approaches to classification. I usually start with either randomForest or a bagged tree (see the randomForest and ipred packages). If I can get good performance with those models, I'll try simpler models that are more interpretable, like single trees (the rpart package), FDA with method = mars (using the mda package) and other approaches. Max -----Original Message----- From: ?? ?? [mailto:kawaguchi at math.kyushu-u.ac.jp] Sent: Monday, April 09, 2007 1:49 PM To: Kuhn, Max Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Could not fit correct values in discriminant analysis by bruto. Dear Max, Thank you very much ! Your sample code is very helpful. In linear separable problem, I should use fda by linear regression instead of bruto unless taking some dimensional reduction process, should I? Cheers. Shuji On 2007/04/09, at 22:25, Kuhn, Max wrote:> Shuji, > > I suspect that bruto blows up because your data are linearly > separable. > To see this (if you didn't already know), try > > library(lattice) > splom(~x, groups = y) > > and look at the first row. If you are trying to do classification, > there > are a few methods that would choke on this (logistic regression) and a > few that won't (trees, svms etc). I would guess that bruto is in the > latter group. > > However, if you are try to do classification, try using bruto via fda: > >> tmp <- cbind(x, factor(y)) >> >> fdaFit <- fda(y2~., tmp) >> fdaFit > Call: > fda(formula = y2 ~ ., data = tmp) > > Dimension: 1 > > Percent Between-Group Variance Explained: > v1 > 100 > > Degrees of Freedom (per dimension): 5 > > Training Misclassification Error: 0 ( N = 20 ) >> >> predict(fdaFit, type = "posterior")[1:3,] > 0 1 > 2 0 1 > 2 0 1 > 2 0 1 > > Max > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ?? ?? > Sent: Sunday, April 08, 2007 10:47 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Could not fit correct values in discriminant analysis by > bruto. > > Dear R-users, > > I would like to use "bruto" function in mda package for flexible > discriminant analysis. > Then, I tried, for example, following approach. > >> x > band1 band2 band3 > 1 -1.206780 -1.448007 -1.084431 > 2 -0.294938 -0.113222 -0.888895 > 3 -0.267303 -0.241567 -1.040979 > 4 -1.206780 -1.448007 -1.040979 > 5 -1.151518 -0.806286 -0.671630 > 6 -1.179146 -1.396670 -1.453775 > 7 -0.294938 -0.241567 -1.453775 > 8 -0.350200 -0.267239 -1.084431 > 9 -1.151518 -0.857623 -0.649901 > 10 1.362954 -1.396670 -2.235926 > 11 -0.239675 1.118883 1.457551 > 12 -0.294938 -1.268325 -0.497817 > 13 -0.294938 -0.729278 -0.106745 > 14 -1.123883 -0.703612 -0.150196 > 15 0.616905 1.144548 -0.150196 > 16 -0.267303 1.657930 1.044750 > 17 1.611637 1.041874 0.610225 > 18 -1.123883 -0.677941 0.262605 > 19 -0.239675 -0.626604 -0.128473 > 20 2.274797 1.118883 1.805171 > >> y > [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 > >> fit <- bruto(x,y) > > But, obtained fit$fitted.values are enormously high (or low) . > Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or > 0). > Values of column 1 are wrong or appropriate option is needed? > I contacted the package maintainer, but the problem could not be > solved. > > Thanks > > Shuji Kawaguchi > >> R.version > platform i386-apple-darwin8.8.1 > arch i386 > os darwin8.8.1 > system i386, darwin8.8.1 > version.string R version 2.4.0 (2006-10-03)---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}