thr3ads.net - R help - [R] SVM differences between R, Weka, Python [Aug 2013]

If this information is useful, please help other people find it:
Share via:
Valentin Kuznetsov
2013-Aug-12 16:36 UTC
[R] SVM differences between R, Weka, Python

Hi,
I'm studying SVMs and found that if I run SVM in R, Weka, Python their
results are differ. So, to eliminate  possible pitfalls, I decided to use
standard iris dataset and wrote implementation in R, Weka, Python for the same
SVM/kernel. I think the choice of kernel does not matter and only needs to be
consistent among implementations. I excluded cross validation since python does
not have it and tried to keep consistent set of input parameters among all
implementations (I went through them all and the defaults seems consistent). So,
the Weka and Python both produced identical confusion matrix, but R results
stays apart (I tried both e1071 and kerblab, they consistent among each other,
but differ from Weka/Python). That's why I decided to post my message to R
community and ask for help to identify the "problem" (if any) or get
reasonable explanation why R results can differ. Please note that all
implementation uses libsvm underneath (at least that what I got from reading),
so I would expect results to be the same. I understand that seeds may differ,
but I used entire dataset without any sampling, may be there is internal
normalization?

I'm posting the code for all implementations along with confusion matrix
outputs. Feel free to reproduce and comment.

Thanks,
Valentin.

Weka:
--------------------------------------------------
#!/usr/bin/env bash
# set path to Weka
export CLASSPATH=/Applications/weka-3-6-9.app/Contents/Resources/Java/weka.jar
data=./iris.arff
kernel="weka.classifiers.functions.supportVector.RBFKernel -C 250007 -G
0.01"
c=1.0
t=0.001
# -V The number of folds for the internal cross-validation. (default -1, use
training data)
# -N Whether to 0=normalize/1=standardize/2=neither. (default 0=normalize)
# -W The random number seed. (default 1)
#opts="-C $c -L $t -N 2 -V -1 -W 1"
opts="-C $c -L $t -N 2"
cmd="java weka.classifiers.functions.SMO"
if [ "$1" == "help" ]; then
    $cmd
    exit 0
fi
$cmd $opts -K "$kernel" -t $data

--------------------------------------------------

  a  b  c   <-- classified as
 50  0  0 |  a = Iris-setosa
  0 47  3 |  b = Iris-versicolor
  0  5 45 |  c = Iris-virginica

Python:
--------------------------------------------------
from sklearn import svm
from sklearn import svm, datasets
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

def report(clf, x_test, y_test):
    y_pred = clf.predict(x_test)
    print clf
    print(classification_report(y_test, y_pred))
    print(confusion_matrix(y_test, y_pred))

def classifier():
    # import some data to play with
    iris = datasets.load_iris()
    x_train = iris.data
    y_train = iris.target
    regC = 1.0  # SVM regularization parameter
    clf = svm.SVC(kernel='rbf', gamma=0.01, C=regC).fit(x_train,
y_train)
    report(clf, x_train, y_train)

if __name__ == '__main__':
    classifier()
--------------------------------------------------

[[50  0  0]
 [ 0 47  3]
 [ 0  5 45]]

R:
--------------------------------------------------
library(kernlab)
library(e1071)

# load data
data(iris)

# run svm algorithm (e1071 library) for given vector of data and kernel
model <- svm(Species~., data=iris, kernel="radial", gamma=0.01)
print(model)
# the last column of this dataset is what we'll predict, so we'll
exclude it
prediction <- predict(model, iris[,-ncol(iris)])
# the last column is what we'll check against for
tab <- table(pred = prediction, true = iris[,ncol(iris)])
print(tab)
cls <- classAgreement(tab)
msg <- sprintf("Correctly classified: %f, kappa %f", cls$diag,
cls$kappa)
print(msg)
--------------------------------------------------

            true
pred         setosa versicolor virginica
  setosa         50          0         0
  versicolor      0         46        11
  virginica       0          4        39
[1] "Correctly classified: 0.900000, kappa 0.850000"
R help - Aug 2013 - SVM differences between R, Weka, Python

[R] SVM differences between R, Weka, Python