thr3ads.net - R help - [R] error rate for cluster analysis [Sep 2007]

If this information is useful, please help other people find it:
Share via:

Tom Boonen

2007-Sep-24 17:15 UTC

[R] error rate for cluster analysis

Hi all,

I am looking for an R function or a metric that I could self code that
compare the results of a clustering exercise with a given solution
key.

An example. Let's say four elements are clustered, the number of
clustered is unknown a priori. For my guess and the solution, I have
two matrices with two columns the first colum gives the cluster id,
the second the element id:

guess <- cbind(c(1,1,2,3),c(1,2,3,4));
solution <- cbind(c(1,2,3,3),c(1,2,3,4));
colnames(guess) <- colnames(solution) <-
c("cluster.id","element.id");
guess;
solution;

So here the guess is wrong in several ways. The guess claims elements
3 & 4 belong to distinct clusters, but in the solution we see that
they belong to the same. Also, the guess claims elements 1 & 2 belong
to one cluster, but in the solution we see they belong to distinct
clusters.

What I am looking for is a function or a metric that I could code up
myself, that defines a sensible distance between the guess and the
solution. There are various ways to do this, but I am just wondering
if there is some standard way of doing this in one of the cluster
analysis packages or so.

Thanks very much,
Tom

R help - Sep 2007 - error rate for cluster analysis

[R] error rate for cluster analysis