Andrew Digby
2013-Nov-15 00:31 UTC
[R] Inconsistent results between caret+kernlab versions
I'm using caret to assess classifier performance (and it's great!). However, I've found that my results differ between R2.* and R3.* - reported accuracies are reduced dramatically. I suspect that a code change to kernlab ksvm may be responsible (see version 5.16-24 here: http://cran.r-project.org/web/packages/caret/news.html). I get very different results between caret_5.15-61 + kernlab_0.9-17 and caret_5.17-7 + kernlab_0.9-19 (see below). Can anyone please shed any light on this? Thanks very much! ### To replicate: require(repmis) # For downloading from https df <- source_data('https://dl.dropboxusercontent.com/u/47973221/data.csv', sep=',') require(caret) svm.m1 <- train(df[,-1],df[,1],method='svmRadial',metric='Kappa',tunelength=5,trControl=trainControl(method='repeatedcv', number=10, repeats=10, classProbs=TRUE)) svm.m1 sessionInfo() ### Results - R2.15.2> svm.m11241 samples 7 predictors 10 classes: ?O27479?, ?O31403?, ?O32057?, ?O32059?, ?O32060?, ?O32078?, ?O32089?, ?O32663?, ?O32668?, ?O32676? No pre-processing Resampling: Cross-Validation (10 fold, repeated 10 times) Summary of sample sizes: 1116, 1116, 1114, 1118, 1118, 1119, ... Resampling results across tuning parameters: C Accuracy Kappa Accuracy SD Kappa SD 0.25 0.684 0.63 0.0353 0.0416 0.5 0.729 0.685 0.0379 0.0445 1 0.756 0.716 0.0357 0.0418 Tuning parameter ?sigma? was held constant at a value of 0.247 Kappa was used to select the optimal model using the largest value. The final values used for the model were C = 1 and sigma = 0.247.> sessionInfo()R version 2.15.2 (2012-10-26) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] e1071_1.6-1 class_7.3-5 kernlab_0.9-17 repmis_0.2.4 caret_5.15-61 reshape2_1.2.2 plyr_1.8 lattice_0.20-10 foreach_1.4.0 cluster_1.14.3 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_2.15.2 digest_0.6.0 evaluate_0.4.3 formatR_0.7 grid_2.15.2 httr_0.2 iterators_1.0.6 knitr_1.1 RCurl_1.95-4.1 stringr_0.6.2 tools_2.15.2 ### Results - R3.0.2> require(caret) > svm.m1 <- train(df[,-1],df[,1],method=?svmRadial?,metric=?Kappa?,tunelength=5,trControl=trainControl(method=?repeatedcv?, number=10, repeats=10, classProbs=TRUE))Loading required package: class Warning messages: 1: closing unused connection 4 (https://dl.dropboxusercontent.com/u/47973221/df.Rdata) 2: executing %dopar% sequentially: no parallel backend registered> svm.m11241 samples 7 predictors 10 classes: ?O27479?, ?O31403?, ?O32057?, ?O32059?, ?O32060?, ?O32078?, ?O32089?, ?O32663?, ?O32668?, ?O32676? No pre-processing Resampling: Cross-Validation (10 fold, repeated 10 times) Summary of sample sizes: 1118, 1117, 1115, 1117, 1116, 1118, ... Resampling results across tuning parameters: C Accuracy Kappa Accuracy SD Kappa SD 0.25 0.372 0.278 0.033 0.0371 0.5 0.39 0.297 0.0317 0.0358 1 0.399 0.307 0.0289 0.0323 Tuning parameter ?sigma? was held constant at a value of 0.2148907 Kappa was used to select the optimal model using the largest value. The final values used for the model were C = 1 and sigma = 0.215.> sessionInfo()R version 3.0.2 (2013-09-25) Platform: x86_64-apple-darwin10.8.0 (64-bit) locale: [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] e1071_1.6-1 class_7.3-9 kernlab_0.9-19 repmis_0.2.6.2 caret_5.17-7 reshape2_1.2.2 plyr_1.8 lattice_0.20-24 foreach_1.4.1 cluster_1.14.4 loaded via a namespace (and not attached): [1] codetools_0.2-8 compiler_3.0.2 digest_0.6.3 grid_3.0.2 httr_0.2 iterators_1.0.6 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2
Or not! The issue with with kernlab. Background: SVM models do not naturally produce class probabilities. A secondary model (via Platt) is fit to the raw model output and a logistic function is used to translate the raw SVM output to probability-like numbers (i.e. sum to zero, between 0 and 1). In ksvm(), you need to use the option prob.model = TRUE to get that second model. I discovered some time ago that there can be a discrepancy in the predicted classes that naturally come from the SVM model and those derived by using the class associated with the largest class probability. This is most likely do to natural error in the secondary probability model and should not be unexpected. That is the case for your data. In you use the same tuning parameters as those suggested by train() and go straight to ksvm():> newSVM <- ksvm(x = as.matrix(df[,-1]),+ y = df[,1], + kernel = rbfdot(sigma = svm.m1$bestTune$.sigma), + C = svm.m1$bestTune$.C, + prob.model = TRUE)> > predict(newSVM, df[43,-1])[1] O32078 10 Levels: O27479 O31403 O32057 O32059 O32060 O32078 ... O32676> predict(newSVM, df[43,-1], type = "probabilities")O27479 O31403 O32057 O32059 O32060 O32078 [1,] 0.08791826 0.05911645 0.2424997 0.1036943 0.06968587 0.1648394 O32089 O32663 O32668 O32676 [1,] 0.04890477 0.05210836 0.09838892 0.07284396 Note that, based on the probability model, the class with the largest probability is O32057 (p = 0.24) while the basic SVM model predicts O32078 (p = 0.16). Somebody (maybe me) saw this discrepancy and that led to me to follow this rule: if(prob.model = TRUE) use the class with the maximum probability else use the class prediction from ksvm(). Therefore:> predict(svm.m1, df[43,-1])[1] O32057 10 Levels: O27479 O31403 O32057 O32059 O32060 O32078 ... O32676 That change occurred between the two caret versions that you tested with. (On a side note, can also occur with ksvm() and rpart() if cost-sensitive training is used because the class designation takes into account the costs but the class probability predictions do not. I alerted both package maintainers to the issue some time ago.) HTH, Max On Fri, Nov 15, 2013 at 1:56 PM, Max Kuhn <mxkuhn at gmail.com> wrote:> I've looked into this a bit and the issue seems to be with caret. I've > been looking at the svn check-ins and nothing stands out to me as the > issue so far. The final models that are generated are the same and > I'll try to figure out the difference. > > Two small notes: > > 1) you should set the seed to ensure reproducibility. > 2) you really shouldn't use character stings with all numbers as > factor levels with caret when you want class probabilities. It should > give you a warning about this > > Max > > On Thu, Nov 14, 2013 at 7:31 PM, Andrew Digby <andrewdigby at mac.com> wrote: >> >> I'm using caret to assess classifier performance (and it's great!). However, I've found that my results differ between R2.* and R3.* - reported accuracies are reduced dramatically. I suspect that a code change to kernlab ksvm may be responsible (see version 5.16-24 here: http://cran.r-project.org/web/packages/caret/news.html). I get very different results between caret_5.15-61 + kernlab_0.9-17 and caret_5.17-7 + kernlab_0.9-19 (see below). >> >> Can anyone please shed any light on this? >> >> Thanks very much! >> >> >> ### To replicate: >> >> require(repmis) # For downloading from https >> df <- source_data('https://dl.dropboxusercontent.com/u/47973221/data.csv', sep=',') >> require(caret) >> svm.m1 <- train(df[,-1],df[,1],method='svmRadial',metric='Kappa',tunelength=5,trControl=trainControl(method='repeatedcv', number=10, repeats=10, classProbs=TRUE)) >> svm.m1 >> sessionInfo() >> >> ### Results - R2.15.2 >> >>> svm.m1 >> 1241 samples >> 7 predictors >> 10 classes: ?O27479?, ?O31403?, ?O32057?, ?O32059?, ?O32060?, ?O32078?, ?O32089?, ?O32663?, ?O32668?, ?O32676? >> >> No pre-processing >> Resampling: Cross-Validation (10 fold, repeated 10 times) >> >> Summary of sample sizes: 1116, 1116, 1114, 1118, 1118, 1119, ... >> >> Resampling results across tuning parameters: >> >> C Accuracy Kappa Accuracy SD Kappa SD >> 0.25 0.684 0.63 0.0353 0.0416 >> 0.5 0.729 0.685 0.0379 0.0445 >> 1 0.756 0.716 0.0357 0.0418 >> >> Tuning parameter ?sigma? was held constant at a value of 0.247 >> Kappa was used to select the optimal model using the largest value. >> The final values used for the model were C = 1 and sigma = 0.247. >>> sessionInfo() >> R version 2.15.2 (2012-10-26) >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) >> >> locale: >> [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] e1071_1.6-1 class_7.3-5 kernlab_0.9-17 repmis_0.2.4 caret_5.15-61 reshape2_1.2.2 plyr_1.8 lattice_0.20-10 foreach_1.4.0 cluster_1.14.3 >> >> loaded via a namespace (and not attached): >> [1] codetools_0.2-8 compiler_2.15.2 digest_0.6.0 evaluate_0.4.3 formatR_0.7 grid_2.15.2 httr_0.2 iterators_1.0.6 knitr_1.1 RCurl_1.95-4.1 stringr_0.6.2 tools_2.15.2 >> >> ### Results - R3.0.2 >> >>> require(caret) >>> svm.m1 <- train(df[,-1],df[,1],method=?svmRadial?,metric=?Kappa?,tunelength=5,trControl=trainControl(method=?repeatedcv?, number=10, repeats=10, classProbs=TRUE)) >> Loading required package: class >> Warning messages: >> 1: closing unused connection 4 (https://dl.dropboxusercontent.com/u/47973221/df.Rdata) >> 2: executing %dopar% sequentially: no parallel backend registered >>> svm.m1 >> 1241 samples >> 7 predictors >> 10 classes: ?O27479?, ?O31403?, ?O32057?, ?O32059?, ?O32060?, ?O32078?, ?O32089?, ?O32663?, ?O32668?, ?O32676? >> >> No pre-processing >> Resampling: Cross-Validation (10 fold, repeated 10 times) >> >> Summary of sample sizes: 1118, 1117, 1115, 1117, 1116, 1118, ... >> >> Resampling results across tuning parameters: >> >> C Accuracy Kappa Accuracy SD Kappa SD >> 0.25 0.372 0.278 0.033 0.0371 >> 0.5 0.39 0.297 0.0317 0.0358 >> 1 0.399 0.307 0.0289 0.0323 >> >> Tuning parameter ?sigma? was held constant at a value of 0.2148907 >> Kappa was used to select the optimal model using the largest value. >> The final values used for the model were C = 1 and sigma = 0.215. >>> sessionInfo() >> R version 3.0.2 (2013-09-25) >> Platform: x86_64-apple-darwin10.8.0 (64-bit) >> >> locale: >> [1] en_NZ.UTF-8/en_NZ.UTF-8/en_NZ.UTF-8/C/en_NZ.UTF-8/en_NZ.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] e1071_1.6-1 class_7.3-9 kernlab_0.9-19 repmis_0.2.6.2 caret_5.17-7 reshape2_1.2.2 plyr_1.8 lattice_0.20-24 foreach_1.4.1 cluster_1.14.4 >> >> loaded via a namespace (and not attached): >> [1] codetools_0.2-8 compiler_3.0.2 digest_0.6.3 grid_3.0.2 httr_0.2 iterators_1.0.6 RCurl_1.95-4.1 stringr_0.6.2 tools_3.0.2 >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Max-- Max
Reasonably Related Threads
- Can ROC be used as a metric for optimal model selection for randomForest?
- Trying to extract probabilities in CARET (caret) package with a glmStepAIC model
- RandomForest tuning the parameters
- Custom caret metric based on prob-predictions/rankings
- caret train and trainControl