Running Weka's command line with calls to system(), like this
> system("java weka.classifiers.bayes.NaiveBayes -K -t HWlrTrain.arff
-o")
=== Confusion Matrix ==
a b <-- classified as
3518 597 | a = NoSpray
644 926 | b = Spray
=== Stratified cross-validation ==
=== Confusion Matrix ==
a b <-- classified as
3512 603 | a = NoSpray
653 917 | b = Spray
So far, no surprises except that maybe I might have expected a few
more misclassifications in the cross-validation.
However,
If I use the same data in R> train.df <- read.arff("HWlrTrain.arff")
using RWeka, like this:
NB <- make_Weka_classifier("weka/classifiers/bayes/NaiveBayes")
wNB <- NB(decision ~ ., data = train.df,
+ control = Weka_control(K = TRUE))> summary(wNB)
=== Summary ==
Correctly Classified Instances 4437 78.0475 %
Incorrectly Classified Instances 1248 21.9525 %
Kappa statistic 0.4446
Mean absolute error 0.2679
Root mean squared error 0.3924
Relative absolute error 67.0055 %
Root relative squared error 87.7545 %
Coverage of cases (0.95 level) 97.9244 %
Mean rel. region size (0.95 level) 83.0519 %
Total Number of Instances 5685
=== Confusion Matrix ==
a b <-- classified as
3520 595 | a = NoSpray
653 917 | b = Spray
The resulting confusion matrix is different from both the training and
the cross-validation matrices from Weka's command line.
Somewhat ironically, if I use the model to predict on test data, like
this, predict(wNB, test.df)
I do get exactly the same as I would from the Weka CLI.
Maybe the difference isn't important, but I would have expected the
two approaches would have done exactly the same thing.
Any possible explanations?
--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Average minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Eleanor Roosevelt
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.