Pierre Dangauthier
2008-May-13 21:27 UTC
[R] Un-reproductibility of SVM classification with 'e1071' libSVM package
Hello, When calling several times the svm() function, I get different results. Do I miss something, or is there some random generation in the C library? In this second hypothesis, is it possible to fix an eventual seed? Thank you Pierre ### Example library('e1071') x = rnorm(100) # train set y = rnorm(100) c = runif(100)>0.5 x2 = rnorm(100)# test set y2 = rnorm(100) # learning a svm model 2 times, predicting 2 times, and results differ ! set.seed(15) model = svm(data.frame(x, y), as.factor(c), probability=TRUE ) pred1 = predict( model, newdata = data.frame(x2, y2), probability=TRUE) probas1 = as.numeric(attr(pred1,"probabilities")[,"TRUE"]) set.seed(15) model = svm(data.frame(x, y), as.factor(c), probability=TRUE ) pred2 = predict( model, newdata = data.frame(x2, y2), probability=TRUE) probas2 = as.numeric(attr(pred2,"probabilities")[,"TRUE"]) sum(pred1 != pred2) # It should be 0 sum(probas1 != probas2) # It should be 0 plot(probas1,probas2,xlim=c(0.4,0.6),ylim=c(0.4,0.6),col="red") # redo the whole example to see some strange patterns! Especially around the 0.5 value. ## Technical details: I'm using lastest R version 2.7.0 on an up-to-date windows vista (same problem with another computer with windows XP)> sessionInfo()R version 2.7.0 (2008-04-22) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] e1071_1.5-18 class_7.2-41 loaded via a namespace (and not attached): [1] tools_2.7.0 [[alternative HTML version deleted]]