Andrew Perrin wrote:>
> Greetings.
>
> I've been experimenting with some algorithms for document
classification
> (specifically, a Naive Bayes classifier and a kNN classifier) and I would
> now like to calculate some inter-rater reliability scores. I have the data
> in a PostgreSQL database, such that for each document, each measure (there
> are 9) has three variables: ap_(measure), nb_(measure), and
> knn_(measure). ap is me (Andrew Perrin), nb is Naive Bayes, and knn is
> knn.
>
> I have two questions:
> 1.) I have used the code in the Using R for Psychology... paper to
> calculate Cohen's Kappa (kappaFor2). It returns a (fairly low) kappa,
but
> also some warnings I don't understand:
> > kappaFor2(ap.nb.df$ap.sub,ap.nb.df$nb.sub)
> kappa S.E. z.stat p.value
> 0.09411765 0.33707660 0.27921738 0.78007800
> Warning messages:
> 1: longer object length
> is not a multiple of shorter object length in: tm1 * tm2
> 2: longer object length
> is not a multiple of shorter object length in: tm1 * tm2
> 3: longer object length
> is not a multiple of shorter object length in: tm1 + tm2
>
It looks like the number of levels in ap.sub and nb.sub differ for some
reason. Kappa compares two categ. variables with identical levels (there
are methods, though, for matching them if only the labelling is
different and you don't know the mapping).
> 2.) I'd be interested in other measures of reliability, specifically
ones
> from the NLP literature such as precision, recall, and F1. These seem more
> interesting for my uses, if for no other reason than what I'm really
> interested in is comparing the success of nb and knn at approaching the ap
> categories. Are there any packages that provide such measures?
There is also the well-known Rand-index which deals with deals with this
kind of problems by analysing all pairs of observations - you might have
a look at classAgreement() in package e1071.
-g
d.
>
> Many thanks.
>
> ----------------------------------------------------------------------
> Andrew J Perrin - andrew_perrin at unc.edu - http://www.unc.edu/~aperrin
> Assistant Professor of Sociology, U of North Carolina, Chapel Hill
> 269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
--
Mag. David Meyer Wiedner Hauptstrasse 8-10
Vienna University of Technology A-1040 Vienna/AUSTRIA
Department for Tel.: (+431) 58801/10772
Statistics and Probability Theory mail: david.meyer at ci.tuwien.ac.at
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._