thr3ads.net - R help - [R] caret's Kappa for categorical resampling [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Harlan Harris

2011-Jun-22 19:37 UTC

[R] caret's Kappa for categorical resampling

Hello,

When evaluating different learning methods for a categorization problem with
the (really useful!) caret package, I'm getting confusing results from the
Kappa computation. The data is about 20,000 rows and a few dozen columns,
and the categories are quite asymmetrical, 4.1% in one category and 95.9% in
the other. When I train a ctree model as:

model <- train(dat.dts,
                 dat.dts.class,
                 method='ctree',
                 tuneLength=8,
                 trControl=trainControl(number = 5, workers=1),
                 metric='Kappa')

I get the following puzzling numbers:


  mincriterion  Accuracy  Kappa   Accuracy SD  Kappa SD
  0.01          0.961     0.0609  0.00151      0.0264
  0.15          0.962     0.049   0.00116      0.0248
  0.29          0.963     0.0405  0.00227      0.035
  0.43          0.964     0.0349  0.00257      0.0247
  0.57          0.964     0.0382  0.0022       0.0199
  0.71          0.964     0.0354  0.00255      0.0257
  0.85          0.964     0.036   0.00224      0.024
  0.99          0.965     0.0091  0.00173      0.0203


(mincriterion determines the likelihood of accepting a split into the tree.)
The Accuracy numbers look sorta reasonable, if not great; the model overfits
and barely beats the base rate if it builds a complicated tree. But the
Kappa numbers go the opposite direction, and here's where I'm not sure
what's going on. The examples in the vingette show Accuracy and Kappa being
positively correlated. I thought Kappa was just (Accuracy - baserate)/(1 -
baserate), but the reported Kappa is definitely not that.

Suggestions? Aside from looking for a better model, which would be good
advice here, what metric would you recommend? Thank you!

 -Harlan

	[[alternative HTML version deleted]]

Harlan Harris

2011-Jun-23 18:10 UTC

head link

[R] caret's Kappa for categorical resampling

Yes, that's true. On a test set, the highest probability of being in the
smaller class is about 40%. (Incidentally, accuracy on the test set is much
higher when I use the best-according-to-Kappa model instead of the
best-according-to-Accuracy model.)

It looks like the ctree() method supports weights, but all it does is
multiply the class likelihoods, which isn't what I want. (That is, if I
assign a weight of 2 to all of the small-class instances, it generates the
same model, but says that the likelihood for the most-confident instances is
about 80% instead of 40%!)

I'm still not really understanding why Kappa is not acting like a positive
monotonic function of Accuracy, though.

Thanks!


On Wed, Jun 22, 2011 at 8:12 PM, kuhnA03 <max.kuhn@pfizer.com> wrote:
>  Harlan,
>
> It looks like your model is predicting (almost) everything to be the
> majority class (accuracy is almost the same as the largest class
> percentage). Try setting a test set aside and use confusionMatrix to look
at
> how the model is predicting in more detail. You can try other models that
> will let you weight the minority class higher to get a more balanced
> prediction.
>
> Max
>
>
>
> On 6/22/11 3:37 PM, "Harlan Harris" <harlan@harris.name>
wrote:
>
> Hello,
>
> When evaluating different learning methods for a categorization problem
> with the (really useful!) caret package, I'm getting confusing results
from
> the Kappa computation. The data is about 20,000 rows and a few dozen
> columns, and the categories are quite asymmetrical, 4.1% in one category
and
> 95.9% in the other. When I train a ctree model as:
>
> model <- train(dat.dts,
>                  dat.dts.class,
>                  method='ctree',
>                  tuneLength=8,
>                  trControl=trainControl(number = 5, workers=1),
>                  metric='Kappa')
>
> I get the following puzzling numbers:
>
>
>
>   mincriterion  Accuracy  Kappa   Accuracy SD  Kappa SD
>   0.01          0.961     0.0609  0.00151      0.0264
>   0.15          0.962     0.049   0.00116      0.0248
>   0.29          0.963     0.0405  0.00227      0.035
>   0.43          0.964     0.0349  0.00257      0.0247
>   0.57          0.964     0.0382  0.0022       0.0199
>   0.71          0.964     0.0354  0.00255      0.0257
>   0.85          0.964     0.036   0.00224      0.024
>   0.99          0.965     0.0091  0.00173      0.0203
>
> (mincriterion determines the likelihood of accepting a split into the
> tree.) The Accuracy numbers look sorta reasonable, if not great; the model
> overfits and barely beats the base rate if it builds a complicated tree.
But
> the Kappa numbers go the opposite direction, and here's where I'm
not sure
> what's going on. The examples in the vingette show Accuracy and Kappa
being
> positively correlated. I thought Kappa was just (Accuracy - baserate)/(1 -
> baserate), but the reported Kappa is definitely not that.
>
> Suggestions? Aside from looking for a better model, which would be good
> advice here, what metric would you recommend? Thank you!
>
>  -Harlan
>
>
>
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Jun 2011 - caret's Kappa for categorical resampling

[R] caret's Kappa for categorical resampling

[R] caret's Kappa for categorical resampling

Seemingly Similar Threads