Hi all - I am trying to tune an SVM model by optimizing the cross-validation accuracy. Maximizing this value doesn't necessarily seem to minimize the number of misclassifications. Can anyone tell me how the cross-validation accuracy is defined? In the output below, for example, cross-validation accuracy is 92.2%, while the number of correctly classified samples is (1476+170)/(1476+170+4) = 99.7% !? Thanks for any help. Regards - Ton --- Parameters: SVM-Type: C-classification SVM-Kernel: radial cost: 8 gamma: 0.007 Number of Support Vectors: 1015 ( 148 867 ) Number of Classes: 2 Levels: false true 5-fold cross-validation on training data: Total Accuracy: 92.24242 Single Accuracies: 90 93.33333 94.84848 92.72727 90.30303 Contingency Table predclasses origclasses false true false 1476 0 true 4 170
The 99.7% accuracy you quoted, I take it, is the accuracy on the training set. If so, that number hardly means anything (other than, perhaps, self-fulfilling prophecy). Usually what one would want is for the model to be able to predict data that weren't used to train the model with high accuracy. That's what cross-validation tries to emulate. It gives you an estimate of how well you can expect your model to do on data that the model has not seen. Andy> From: Ton van Daelen > > Hi all - > > I am trying to tune an SVM model by optimizing the cross-validation > accuracy. Maximizing this value doesn't necessarily seem to > minimize the > number of misclassifications. Can anyone tell me how the > cross-validation accuracy is defined? In the output below, > for example, > cross-validation accuracy is 92.2%, while the number of correctly > classified samples is (1476+170)/(1476+170+4) = 99.7% !? > > Thanks for any help. > > Regards - Ton > > --- > Parameters: > SVM-Type: C-classification > SVM-Kernel: radial > cost: 8 > gamma: 0.007 > > Number of Support Vectors: 1015 > > ( 148 867 ) > > Number of Classes: 2 > > Levels: > false true > > 5-fold cross-validation on training data: > > Total Accuracy: 92.24242 > Single Accuracies: > 90 93.33333 94.84848 92.72727 90.30303 > > Contingency Table > predclasses > origclasses false true > false 1476 0 > true 4 170 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Ton van Daelen wrote:> Hi all - > > I am trying to tune an SVM model by optimizing the cross-validation > accuracy. Maximizing this value doesn't necessarily seem to minimize the > number of misclassifications. Can anyone tell me how the > cross-validation accuracy is defined? In the output below, for example, > cross-validation accuracy is 92.2%, while the number of correctly > classified samples is (1476+170)/(1476+170+4) = 99.7% !? > > Thanks for any help. > > Regards - TonPercent correctly classified is an improper scoring rule. The percent is maximized when the predicted values are bogus. In addition, one can add a very important predictor and have the % actually decrease. Frank Harrell> > --- > Parameters: > SVM-Type: C-classification > SVM-Kernel: radial > cost: 8 > gamma: 0.007 > > Number of Support Vectors: 1015 > > ( 148 867 ) > > Number of Classes: 2 > > Levels: > false true > > 5-fold cross-validation on training data: > > Total Accuracy: 92.24242 > Single Accuracies: > 90 93.33333 94.84848 92.72727 90.30303 > > Contingency Table > predclasses > origclasses false true > false 1476 0 > true 4 170 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University