I haven't read the book, but could anyone explain more about this parameter? help(agnes) says that ac measures the amount of clustering structure found. From the definition given in help(agnes.object), however, it seems that as long as the dissimilarity of the merger in the final step of the algorithm is large enough, the ac value will be close to 1. So what does ac really mean? Thank you, Weiguang
Google really can be a very useful thing, in case you haven't found that. This is the first hit I got with `agglomerative coefficient': http://www.unesco.org/webworld/idams/advguide/Chapt7_1_4.htm Andy> From: Weiguang Shi > > I haven't read the book, but could anyone explain more > about this parameter? > > help(agnes) says that ac measures the amount of > clustering structure found. From the definition given > in help(agnes.object), however, it seems that as long > as > the dissimilarity of the merger in the final step of > the > algorithm is large enough, the ac value will be close > to > 1. So what does ac really mean? > > Thank you, > Weiguang > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Thanks again Andy. The definition of AC is understood, yet I have trouble picturing the amount of "clear clustering structure" it measures. To put things into perspective, for two series 1,2,1000,1001 and 1,2,3,1000 agnes(x, method="single") generates ac values of 0.998998 and 0.0.7492477 respectively, yet it seems to me that both have fairly clear clustering structures. --- "Liaw, Andy" <andy_liaw at merck.com> wrote:> BTW, I checked the book. You're not going find much > more than that. >Thanks for checking. Weiguang
> -----Original Message----- > From: Weiguang Shi > > Thanks again Andy. > > The definition of AC is understood, yet I have trouble > picturing the amount of "clear clustering structure" > it measures. To put things into perspective, for two > series > 1,2,1000,1001 > and > 1,2,3,1000 > agnes(x, method="single") generates ac values of > 0.998998 and 0.0.7492477 respectively, yet it seems to > me that both have fairly clear clustering structures.It has to do with sample sizes. Consider the following: testAC <- function(prop1=0.5, x=rnorm(50), center=c(0, 100), ...) { stopifnot(require(cluster)) n <- length(x) n1 <- ceiling(n * prop1) n2 <- n - n1 agnes(x + rep(center, c(n1, n2)), ...)$ac } Now some tests:> sapply(c(.25, .5), testAC, x=x[1:4], method="single")[1] 0.7427591 0.9862944> sapply(1:5 / 10, testAC, x=x[1:10], method="single")[1] 0.8977139 0.9974224 0.9950061 0.9946366 0.9946366> sapply(1:5 / 10, testAC, x=x, method="single")[1] 0.9982955 0.9969757 0.9971114 0.9971127 0.9975111 So it seems like AC does not consider isolated singletons as cluster structures. This is only discernable in small sample size, though. Andy> --- "Liaw, Andy" <andy_liaw at merck.com> wrote: > > BTW, I checked the book. You're not going find much > > more than that. > > > Thanks for checking. > > Weiguang > > ______________________________________________________________ > ________ > Post your free ad now! http://personals.yahoo.ca > >
Well I am not sure that can call a single figure a cluster. Sure it's not near the others but how can you conceptually measure it's cluster properties. It seems reasonable that there has to be some form of doubt about it. Back to that Google search hit number 3 www.stat.ncu.edu.tw/teacher/ hungy/mva/notes/lecture-cluster-example.pdf gives examples which are not close to 1. It is said that "The quality of an agglomerative clustering of the data can be measured by the agglomerative coefficient" this is ascribed to Kaufman L. and Rousseeuw P. (1990), "Finding Groups in Data, an Introduction to Cluster Analysis", Wiley, New York. After I had read some of the recent work on clustering I realised that clustering is as much art as it is anything else. There is a wealth of papers with arguments about which methods should be used to assess the effectiveness of the clustering process. I don't think it matters which type of evaluation method you use they are not absolute numbers, they need to be seen as relative. They also need to be seen as an attempt at modelling a method of quality assessment for which there is no clear winner. So the bottom line is that if for your purposes a single number on it's own should be classified as a group, you may well have to define your own method of evaluation. Tom> -----Original Message----- > From: Weiguang Shi [mailto:wgshi2001 at yahoo.ca] > Sent: Thursday, 27 January 2005 7:28 AM > To: Liaw, Andy > Cc: rhelp > Subject: RE: [R] agglomerative coefficient in agnes (cluster) > > > Thanks again Andy. > > The definition of AC is understood, yet I have trouble > picturing the amount of "clear clustering structure" > it measures. To put things into perspective, for two > series > 1,2,1000,1001 > and > 1,2,3,1000 > agnes(x, method="single") generates ac values of > 0.998998 and 0.0.7492477 respectively, yet it seems to > me that both have fairly clear clustering structures. > > --- "Liaw, Andy" <andy_liaw at merck.com> wrote: > > BTW, I checked the book. You're not going find much > > more than that. > > > Thanks for checking. > > Weiguang > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >