This is not a bug. It just means that the algorithm sometimes finds an
empty cluster, and as you asked for 34 clusters and it had 33 or less it
stops.
What to do in this situation is currently under discussion, but the advice
given is good: try another set of initial centres.
Please do read the description of a bug in the R FAQ, and do not misuse
the term to mean `something I do not understand'.
On Mon, 10 Nov 2003, Murad Nayal wrote:
> I have been getting the following intermittent error from kmeans:
>
> >str(cavint.p.r)
> num [1:1967, 1:13] 0.691 0.123 0.388 0.268 0.485 ...
> - attr(*, "dimnames")=List of 2
> ..$ : chr [1:1967] "6" "49" "87"
"102" ...
> ..$ : chr [1:13] "HYD" "NEG" "POS"
"OXY" ...
> > set.seed(34)
> > kmeans(cavint.p.r,centers=34)
> Error: empty cluster: try a better set of initial centers
>
> the seed being equal to the number of centers in this case is just a
> coincidence. I've encountered the same error with or without setting
the
> seed at different numbers of clusters.
>
> there is nothing particularly unusual about cavint.p.r (no NAs, NULLs),
> except maybe for the fact that the rows sum to 1.
>
> > sum(is.na(cavint.p.r))
> [1] 0
> > sum(is.nan(cavint.p.r))
> [1] 0
> >
>
> I thought kmeans should select initial centers from the data if not
> given explicitly! any idea what might be going wrong?
And what makes you think it did not?
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595