Hello all,
This might have been fixed in later versions (I am using R1.7.0), r-help
archive contains messages reporting similar problems but no reports of
codes fixes. I have encountered a couple of problems using the
silhouette function. one occurs when the clustering contains clusters
composed of 1 element (Martin Maechler posted code few months ago that
fixes a similar problem that occurs when clusters have only 2 elements
but not the case with 1 element). the other problem is due to
silhouette's assumption that the clusters are numbered sequentially
starting at 1. one of the clustering programs I use (snob) assigns more
or less arbitrary integer ids to clusters starting from 3! (clusters 1
and 2 have special meaning in snob). the modified code fixing both
problems is included below, changes are commented.
best
Murad
silhouette.default <-
function (x, dist, dmatrix, ...)
{
cll <- match.call()
if (!is.null(cl <- x$clustering))
x <- cl
n <- length(x)
if (!all(x == round(x)))
stop("`x' must only have integer codes")
k <- length(clid <- sort(unique(x)))
if (k <= 1 || k >= n)
return(NA)
if (missing(dist)) {
if (missing(dmatrix))
stop("Need either a dissimilarity `dist' or diss.matrix
`dmatrix'")
if (is.null(dm <- dim(dmatrix)) || length(dm) != 2 ||
!all(n == dm))
stop("`dmatrix' is not a dissimilarity matrix compatible to
`x'")
}
else {
dist <- as.dist(dist)
if (n != attr(dist, "Size"))
stop("clustering `x' and dissimilarity `dist' are
incompatible")
dmatrix <- as.matrix(dist)
}
wds <- matrix(NA, n, 3, dimnames = list(names(x), c("cluster",
"neighbor", "sil_width")))
for (j in 1:k) {
Nj <- sum(iC <- x == clid[j])
#
# the following line changed from wds[iC, "cluster"] <- j
#
wds[iC, "cluster"] <- clid[j]
a.i <- if (Nj > 1)
colSums(dmatrix[iC, iC])/(Nj - 1)
else 0
#
# the following line changed from
# diC <- rbind(apply(dmatrix[!iC, iC], 2, function(r) tapply(r,
# x[!iC], mean)))
#
diC <- rbind(apply(cbind(dmatrix[!iC, iC]), 2, function(r)
tapply(r,
x[!iC], mean)))
minC <- max.col(-t(diC))
wds[iC, "neighbor"] <- clid[-j][minC]
#
# the following line changed from
# b.i <- diC[cbind(minC, seq(minC))]
#
b.i <- diC[cbind(minC, seq(along=minC))]
s.i <- (b.i - a.i)/pmax(b.i, a.i)
wds[iC, "sil_width"] <- s.i
}
attr(wds, "Ordered") <- FALSE
attr(wds, "call") <- cll
class(wds) <- "silhouette"
wds
}
--
Murad Nayal M.D. Ph.D.
Department of Biochemistry and Molecular Biophysics
College of Physicians and Surgeons of Columbia University
630 West 168th Street. New York, NY 10032
Tel: 212-305-6884 Fax: 212-305-6926
>>>>> "Murad" == Murad Nayal <mn216 at columbia.edu> >>>>> on Wed, 21 Jan 2004 15:19:28 -0500 writes:Murad> This might have been fixed in later versions (I am Murad> using R1.7.0), yes, the bug has been fixed "long ago", from my ChangeLog (!), it was 2003-07-18. Murad> r-help archive contains messages reporting similar Murad> problems but no reports of codes fixes. I have Murad> encountered a couple of problems using the silhouette Murad> function. one occurs when the clustering contains Murad> clusters composed of 1 element (Martin Maechler Murad> posted code few months ago that fixes a similar Murad> problem that occurs when clusters have only 2 Murad> elements but not the case with 1 element). the other Murad> problem is due to silhouette's assumption that the Murad> clusters are numbered sequentially starting at 1. which is what ?silhouette tells you as well:>> Arguments: >> >> x: an object of appropriate class; for the 'default' method an >> integer vector with cluster codes in '1:k' or a list with >> such an 'x$clustering' component.So, definitely not a bug, but it's your problem of using silhouette() on an object that is not of appropriate structure. Murad> one of the clustering programs I use (snob) assigns Murad> more or less arbitrary integer ids to clusters Murad> starting from 3! (clusters 1 and 2 have special Murad> meaning in snob). the modified code fixing both Murad> problems is included below, changes are commented. Thank you for the good attempt, but really you should work from the source where silhouette.default does also contain comments, and as said, has long been fixed for the case of 1-element clusters {a better fix is not to use cbind() but to use the ominous ", drop = FALSE" when subsetting matrices!} I'm still willing to consider your *feature request* (as opposed to bug fix) of allowing inputs where the grouping vector does contain other than "1:g" . I'll send you the current source of silhouette.default in a private mail. Thanks for your collaboration on improving R! Regards, Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 <><