Pierre Dangauthier
2008-May-13 21:27 UTC
[R] Un-reproductibility of SVM classification with 'e1071' libSVM package
Hello,
When calling several times the svm() function, I get different results.
Do I miss something, or is there some random generation in the C library?
In this second hypothesis, is it possible to fix an eventual seed?
Thank you
Pierre
### Example
library('e1071')
x = rnorm(100) # train set
y = rnorm(100)
c = runif(100)>0.5
x2 = rnorm(100)# test set
y2 = rnorm(100)
# learning a svm model 2 times, predicting 2 times, and results differ !
set.seed(15)
model = svm(data.frame(x, y), as.factor(c), probability=TRUE )
pred1 = predict( model, newdata = data.frame(x2, y2), probability=TRUE)
probas1 = as.numeric(attr(pred1,"probabilities")[,"TRUE"])
set.seed(15)
model = svm(data.frame(x, y), as.factor(c), probability=TRUE )
pred2 = predict( model, newdata = data.frame(x2, y2), probability=TRUE)
probas2 = as.numeric(attr(pred2,"probabilities")[,"TRUE"])
sum(pred1 != pred2) # It should be 0
sum(probas1 != probas2) # It should be 0
plot(probas1,probas2,xlim=c(0.4,0.6),ylim=c(0.4,0.6),col="red")
# redo the whole example to see some strange patterns! Especially around the
0.5 value.
## Technical details:
I'm using lastest R version 2.7.0 on an up-to-date windows vista (same
problem with another computer with windows XP)> sessionInfo()
R version 2.7.0 (2008-04-22)
i386-pc-mingw32
locale:
LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] e1071_1.5-18 class_7.2-41
loaded via a namespace (and not attached):
[1] tools_2.7.0
[[alternative HTML version deleted]]
