Kuhn, Max
2007-Apr-09 13:25 UTC
[R] Could not fit correct values in discriminant analysis by bruto.
Shuji, I suspect that bruto blows up because your data are linearly separable. To see this (if you didn't already know), try library(lattice) splom(~x, groups = y) and look at the first row. If you are trying to do classification, there are a few methods that would choke on this (logistic regression) and a few that won't (trees, svms etc). I would guess that bruto is in the latter group. However, if you are try to do classification, try using bruto via fda: > tmp <- cbind(x, factor(y)) > > fdaFit <- fda(y2~., tmp) > fdaFit Call: fda(formula = y2 ~ ., data = tmp) Dimension: 1 Percent Between-Group Variance Explained: v1 100 Degrees of Freedom (per dimension): 5 Training Misclassification Error: 0 ( N = 20 ) > > predict(fdaFit, type = "posterior")[1:3,] 0 1 2 0 1 2 0 1 2 0 1 Max -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ?? ?? Sent: Sunday, April 08, 2007 10:47 PM To: r-help at stat.math.ethz.ch Subject: [R] Could not fit correct values in discriminant analysis by bruto. Dear R-users, I would like to use "bruto" function in mda package for flexible discriminant analysis. Then, I tried, for example, following approach. > x band1 band2 band3 1 -1.206780 -1.448007 -1.084431 2 -0.294938 -0.113222 -0.888895 3 -0.267303 -0.241567 -1.040979 4 -1.206780 -1.448007 -1.040979 5 -1.151518 -0.806286 -0.671630 6 -1.179146 -1.396670 -1.453775 7 -0.294938 -0.241567 -1.453775 8 -0.350200 -0.267239 -1.084431 9 -1.151518 -0.857623 -0.649901 10 1.362954 -1.396670 -2.235926 11 -0.239675 1.118883 1.457551 12 -0.294938 -1.268325 -0.497817 13 -0.294938 -0.729278 -0.106745 14 -1.123883 -0.703612 -0.150196 15 0.616905 1.144548 -0.150196 16 -0.267303 1.657930 1.044750 17 1.611637 1.041874 0.610225 18 -1.123883 -0.677941 0.262605 19 -0.239675 -0.626604 -0.128473 20 2.274797 1.118883 1.805171 > y [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 > fit <- bruto(x,y) But, obtained fit$fitted.values are enormously high (or low) . Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or 0). Values of column 1 are wrong or appropriate option is needed? I contacted the package maintainer, but the problem could not be solved. Thanks Shuji Kawaguchi > R.version platform i386-apple-darwin8.8.1 arch i386 os darwin8.8.1 system i386, darwin8.8.1 version.string R version 2.4.0 (2006-10-03) ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}
Kuhn, Max
2007-Apr-09 18:23 UTC
[R] Could not fit correct values in discriminant analysis by bruto.
Shuji, My answer is "it depends". Using regression on zero/one dummy variables can work, but there are more direct approaches to classification that might work better. In the case that you showed us, the data were linearly separable by one predictor. If the data that you posted are representative of your typical data sets, you might want to use a very simple approach: identify the important predictor and use an ROC curve to find a good cutoff for it. You don't need the other data. However, if these data represent an "easy" case, you might want to try other approaches to classification. I usually start with either randomForest or a bagged tree (see the randomForest and ipred packages). If I can get good performance with those models, I'll try simpler models that are more interpretable, like single trees (the rpart package), FDA with method = mars (using the mda package) and other approaches. Max -----Original Message----- From: ?? ?? [mailto:kawaguchi at math.kyushu-u.ac.jp] Sent: Monday, April 09, 2007 1:49 PM To: Kuhn, Max Cc: r-help at stat.math.ethz.ch Subject: Re: [R] Could not fit correct values in discriminant analysis by bruto. Dear Max, Thank you very much ! Your sample code is very helpful. In linear separable problem, I should use fda by linear regression instead of bruto unless taking some dimensional reduction process, should I? Cheers. Shuji On 2007/04/09, at 22:25, Kuhn, Max wrote:> Shuji, > > I suspect that bruto blows up because your data are linearly > separable. > To see this (if you didn't already know), try > > library(lattice) > splom(~x, groups = y) > > and look at the first row. If you are trying to do classification, > there > are a few methods that would choke on this (logistic regression) and a > few that won't (trees, svms etc). I would guess that bruto is in the > latter group. > > However, if you are try to do classification, try using bruto via fda: > >> tmp <- cbind(x, factor(y)) >> >> fdaFit <- fda(y2~., tmp) >> fdaFit > Call: > fda(formula = y2 ~ ., data = tmp) > > Dimension: 1 > > Percent Between-Group Variance Explained: > v1 > 100 > > Degrees of Freedom (per dimension): 5 > > Training Misclassification Error: 0 ( N = 20 ) >> >> predict(fdaFit, type = "posterior")[1:3,] > 0 1 > 2 0 1 > 2 0 1 > 2 0 1 > > Max > > -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of ?? ?? > Sent: Sunday, April 08, 2007 10:47 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Could not fit correct values in discriminant analysis by > bruto. > > Dear R-users, > > I would like to use "bruto" function in mda package for flexible > discriminant analysis. > Then, I tried, for example, following approach. > >> x > band1 band2 band3 > 1 -1.206780 -1.448007 -1.084431 > 2 -0.294938 -0.113222 -0.888895 > 3 -0.267303 -0.241567 -1.040979 > 4 -1.206780 -1.448007 -1.040979 > 5 -1.151518 -0.806286 -0.671630 > 6 -1.179146 -1.396670 -1.453775 > 7 -0.294938 -0.241567 -1.453775 > 8 -0.350200 -0.267239 -1.084431 > 9 -1.151518 -0.857623 -0.649901 > 10 1.362954 -1.396670 -2.235926 > 11 -0.239675 1.118883 1.457551 > 12 -0.294938 -1.268325 -0.497817 > 13 -0.294938 -0.729278 -0.106745 > 14 -1.123883 -0.703612 -0.150196 > 15 0.616905 1.144548 -0.150196 > 16 -0.267303 1.657930 1.044750 > 17 1.611637 1.041874 0.610225 > 18 -1.123883 -0.677941 0.262605 > 19 -0.239675 -0.626604 -0.128473 > 20 2.274797 1.118883 1.805171 > >> y > [1] 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 > >> fit <- bruto(x,y) > > But, obtained fit$fitted.values are enormously high (or low) . > Execution of bruto(x[,2:3], y) is done well (values are nearly 1 or > 0). > Values of column 1 are wrong or appropriate option is needed? > I contacted the package maintainer, but the problem could not be > solved. > > Thanks > > Shuji Kawaguchi > >> R.version > platform i386-apple-darwin8.8.1 > arch i386 > os darwin8.8.1 > system i386, darwin8.8.1 > version.string R version 2.4.0 (2006-10-03)---------------------------------------------------------------------- LEGAL NOTICE\ Unless expressly stated otherwise, this messag...{{dropped}}