Hi all -
I am trying to tune an SVM model by optimizing the cross-validation
accuracy. Maximizing this value doesn't necessarily seem to minimize the
number of misclassifications. Can anyone tell me how the
cross-validation accuracy is defined? In the output below, for example,
cross-validation accuracy is 92.2%, while the number of correctly
classified samples is (1476+170)/(1476+170+4) = 99.7% !?
Thanks for any help.
Regards - Ton
---
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 8
gamma: 0.007
Number of Support Vectors: 1015
( 148 867 )
Number of Classes: 2
Levels:
false true
5-fold cross-validation on training data:
Total Accuracy: 92.24242
Single Accuracies:
90 93.33333 94.84848 92.72727 90.30303
Contingency Table
predclasses
origclasses false true
false 1476 0
true 4 170
The 99.7% accuracy you quoted, I take it, is the accuracy on the training set. If so, that number hardly means anything (other than, perhaps, self-fulfilling prophecy). Usually what one would want is for the model to be able to predict data that weren't used to train the model with high accuracy. That's what cross-validation tries to emulate. It gives you an estimate of how well you can expect your model to do on data that the model has not seen. Andy> From: Ton van Daelen > > Hi all - > > I am trying to tune an SVM model by optimizing the cross-validation > accuracy. Maximizing this value doesn't necessarily seem to > minimize the > number of misclassifications. Can anyone tell me how the > cross-validation accuracy is defined? In the output below, > for example, > cross-validation accuracy is 92.2%, while the number of correctly > classified samples is (1476+170)/(1476+170+4) = 99.7% !? > > Thanks for any help. > > Regards - Ton > > --- > Parameters: > SVM-Type: C-classification > SVM-Kernel: radial > cost: 8 > gamma: 0.007 > > Number of Support Vectors: 1015 > > ( 148 867 ) > > Number of Classes: 2 > > Levels: > false true > > 5-fold cross-validation on training data: > > Total Accuracy: 92.24242 > Single Accuracies: > 90 93.33333 94.84848 92.72727 90.30303 > > Contingency Table > predclasses > origclasses false true > false 1476 0 > true 4 170 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Ton van Daelen wrote:> Hi all - > > I am trying to tune an SVM model by optimizing the cross-validation > accuracy. Maximizing this value doesn't necessarily seem to minimize the > number of misclassifications. Can anyone tell me how the > cross-validation accuracy is defined? In the output below, for example, > cross-validation accuracy is 92.2%, while the number of correctly > classified samples is (1476+170)/(1476+170+4) = 99.7% !? > > Thanks for any help. > > Regards - TonPercent correctly classified is an improper scoring rule. The percent is maximized when the predicted values are bogus. In addition, one can add a very important predictor and have the % actually decrease. Frank Harrell> > --- > Parameters: > SVM-Type: C-classification > SVM-Kernel: radial > cost: 8 > gamma: 0.007 > > Number of Support Vectors: 1015 > > ( 148 867 ) > > Number of Classes: 2 > > Levels: > false true > > 5-fold cross-validation on training data: > > Total Accuracy: 92.24242 > Single Accuracies: > 90 93.33333 94.84848 92.72727 90.30303 > > Contingency Table > predclasses > origclasses false true > false 1476 0 > true 4 170 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University