>>>>> Anna F...
>>>>> on Thu, 1 May 2014 22:09:28 +0000 writes:
> Hi Martin,
> I am a statistician at National Jewish Health in Colorado, and I have
been working on clustering a dataset using Ward's minimum variance. When
plotting the dendrogram, the y-axis is labeled as 'height'. Can you
explain to me (or point me in the right direction) on how this distance between
merging clusters is calculated for the Ward method? I have found the calculation
that SAS uses, and I want to check if it is the same in your method.
> Here is a summary of the code I am using:
> Agnes(x,method="ward",diss=TRUE)
Well, as R is case sensitive, it must be
agnes(x,method="ward",diss=TRUE)
Interestingly, the new version of R, R 3.1.0 has now two
different versions of Ward in hclust() :
--> http://stat.ethz.ch/R-manual/R-patched/library/stats/html/hclust.html
where it is stated that previously it was basically not using
Ward's method unless the user was calling it in a specific way,
but agnes() was and is.
*The* reference for all basic routines in the 'cluster' package is
Kaufman, L. and Rousseeuw, P.J. (1990). _Finding
Groups in Data: An Introduction to Cluster Analysis_.
Wiley, New York.
Alternatively, the source code of R and all packages is open,
and for the cluster package, you can either get it from
cluster_*.tar.gz from CRAN, or also you can see the (subversion)
development version at http://svn.r-project.org/
Specifically, the C code which computes agnes() is
https://svn.r-project.org/R-packages/trunk/cluster/src/twins.c
and there,
case 4: /* 4: ward's method */
ta = (double) kwan[la];
tb = (double) kwan[lb];
tq = (double) kwan[lq];
fa = (ta + tq) / (ta + tb + tq);
fb = (tb + tq) / (ta + tb + tq);
fc = -tq / (ta + tb + tq);
int nab = ind_2(la, lb);
dys[naq] = sqrt(fa * dys[naq] * dys[naq] +
fb * dys[nbq] * dys[nbq] +
fc * dys[nab] * dys[nab]);
break;
contains the distance calculation for ward.
...
[ in private communication with Anna, she agreed that I reply
publicly to R-help such that others can chime in and all will be
searchable for people with a similar question. MM ]
Best regards,
Martin Maechler