Dr. Detlef Groth
2013-Mar-13 12:45 UTC
[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
Hello,
here is a working reproducible example which crashes R using kmeans or
gives empty clusters using the nstart option with R 15.2.
library(cluster)
kmeans(ruspini,4)
kmeans(ruspini,4,nstart=2)
kmeans(ruspini,4,nstart=4)
kmeans(ruspini,4,nstart=10)
?kmeans
either we got empty always clusters and or, after some further commands
an segfault.
regards,
Detlef Groth
------------
[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Feb 9 20:52:19 CET 2013
Previous message: [R] Empty cluster / segfault using vanilla kmeans
with version 2.15.2
Next message: [R] Fractional logit in GLM?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
We need a reproducible example.
Uwe Ligges
On 03.02.2013 15:03, Luca Nanetti wrote:> Dear experts,
> I am encountering a version-dependent issue.
>
> My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained below
> never occurred with this version of R
> My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to this
> setup.
>
> The data I'm clustering is constituted by the rows of a 320 x 6 matrix
> containing integers ranging from 1 to 7, no missing data.
> I applied kmeans() to this matrix, literally, 256 x 10??? times using R
> version 2.13.2 or 2.14.1, without never experiencing the slightest problem.
> My usual setup is with k=5, nstart=256, iter.max=50.
>
> Upgrading to R 2.15.2, I experienced either a warning message ('Empty
> cluster. Choose a better set of initial centers') or a catastrophic
> segfault. The only way I can get a solution whatsoever is putting nstart to
> its default value, i.e. 1. However, just repeating the clustering, the same
> issue still happen. Moreover, this is vastly suboptimal, because the risk
> of local minima.
>
> Something similar was reported many years ago, see
> https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was
> then suggested that R's behaviour was correct. I'm not familiar
with such
> an early R version, but the up-to-date documentation of kmeans clearly
> states that "Except for the Lloyd-Forgy method, k clusters will always
be
> returned if a number is specified.".
> I am using the default Hartigan-Wong, and I specify an exact number k:
> thus, k clusters should be returned. They aren't, and the empty cluster
is
> then more likely the symptom of a bug rather than the outcome of a
'true'
> local minimum.
>
> Using synaptic, I managed to downgrade R to version 2.13.2. The problem
> disappeard, i.e. the previous message/segfault didn't occur anymore.
>
> Summarizing: given the same dataset, either an unreasonable message or a
> segfault regularly happen in version 2.15.2 by invoking kmeans() on an
> Ubuntu 11.10 64bit machine. This does not happen at all in previous
> versions of R, on the same machine and operating system.
>
> I respectfully suggest that the behaviour shown in the aforementioned
> versions 2.13.2 and 2.14.1 should be considered 'normal', and that
version
> 2.15.2 should revert to that.
>
> Kind regards,
> Luca Nanetti.
>
> [[alternative HTML version deleted]]
Uwe Ligges
2013-Mar-13 19:48 UTC
[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
On 13.03.2013 13:45, Dr. Detlef Groth wrote:> Hello, > > here is a working reproducible example which crashes R using kmeans or > gives empty clusters using the nstart option with R 15.2. > > > library(cluster) > kmeans(ruspini,4) > kmeans(ruspini,4,nstart=2) > kmeans(ruspini,4,nstart=4) > kmeans(ruspini,4,nstart=10) > ?kmeans > > either we got empty always clusters and or, after some further commands > an segfault.Yes, thanks, I can reproduce it in 2.15.3, but not in R-prerelease. Maybe this is a side effect of a bug already fixed in R-prerelease. Since R-2.15.3 is frozen now, please upgrade to R-prerelease to become R-3.0.0 in April. Best, Uwe Ligges> > regards, > Detlef Groth > > ------------ > > > [R] Empty cluster / segfault using vanilla kmeans with version 2.15.2 > Uwe Ligges ligges at statistik.tu-dortmund.de > Sat Feb 9 20:52:19 CET 2013 > > Previous message: [R] Empty cluster / segfault using vanilla kmeans > with version 2.15.2 > Next message: [R] Fractional logit in GLM? > Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > > We need a reproducible example. > > Uwe Ligges > > > On 03.02.2013 15:03, Luca Nanetti wrote: >> Dear experts, >> I am encountering a version-dependent issue. >> >> My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained >> below >> never occurred with this version of R >> My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to >> this >> setup. >> >> The data I'm clustering is constituted by the rows of a 320 x 6 matrix >> containing integers ranging from 1 to 7, no missing data. >> I applied kmeans() to this matrix, literally, 256 x 10??? times using R >> version 2.13.2 or 2.14.1, without never experiencing the slightest >> problem. >> My usual setup is with k=5, nstart=256, iter.max=50. >> >> Upgrading to R 2.15.2, I experienced either a warning message ('Empty >> cluster. Choose a better set of initial centers') or a catastrophic >> segfault. The only way I can get a solution whatsoever is putting >> nstart to >> its default value, i.e. 1. However, just repeating the clustering, the >> same >> issue still happen. Moreover, this is vastly suboptimal, because the risk >> of local minima. >> >> Something similar was reported many years ago, see >> https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was >> then suggested that R's behaviour was correct. I'm not familiar with such >> an early R version, but the up-to-date documentation of kmeans clearly >> states that "Except for the Lloyd-Forgy method, k clusters will always be >> returned if a number is specified.". >> I am using the default Hartigan-Wong, and I specify an exact number k: >> thus, k clusters should be returned. They aren't, and the empty >> cluster is >> then more likely the symptom of a bug rather than the outcome of a 'true' >> local minimum. >> >> Using synaptic, I managed to downgrade R to version 2.13.2. The problem >> disappeard, i.e. the previous message/segfault didn't occur anymore. >> >> Summarizing: given the same dataset, either an unreasonable message or a >> segfault regularly happen in version 2.15.2 by invoking kmeans() on an >> Ubuntu 11.10 64bit machine. This does not happen at all in previous >> versions of R, on the same machine and operating system. >> >> I respectfully suggest that the behaviour shown in the aforementioned >> versions 2.13.2 and 2.14.1 should be considered 'normal', and that >> version >> 2.15.2 should revert to that. >> >> Kind regards, >> Luca Nanetti. >> >> [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >