Dr. Detlef Groth
2013-Mar-13 12:45 UTC
[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
Hello, here is a working reproducible example which crashes R using kmeans or gives empty clusters using the nstart option with R 15.2. library(cluster) kmeans(ruspini,4) kmeans(ruspini,4,nstart=2) kmeans(ruspini,4,nstart=4) kmeans(ruspini,4,nstart=10) ?kmeans either we got empty always clusters and or, after some further commands an segfault. regards, Detlef Groth ------------ [R] Empty cluster / segfault using vanilla kmeans with version 2.15.2 Uwe Ligges ligges at statistik.tu-dortmund.de Sat Feb 9 20:52:19 CET 2013 Previous message: [R] Empty cluster / segfault using vanilla kmeans with version 2.15.2 Next message: [R] Fractional logit in GLM? Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] We need a reproducible example. Uwe Ligges On 03.02.2013 15:03, Luca Nanetti wrote:> Dear experts, > I am encountering a version-dependent issue. > > My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained below > never occurred with this version of R > My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to this > setup. > > The data I'm clustering is constituted by the rows of a 320 x 6 matrix > containing integers ranging from 1 to 7, no missing data. > I applied kmeans() to this matrix, literally, 256 x 10??? times using R > version 2.13.2 or 2.14.1, without never experiencing the slightest problem. > My usual setup is with k=5, nstart=256, iter.max=50. > > Upgrading to R 2.15.2, I experienced either a warning message ('Empty > cluster. Choose a better set of initial centers') or a catastrophic > segfault. The only way I can get a solution whatsoever is putting nstart to > its default value, i.e. 1. However, just repeating the clustering, the same > issue still happen. Moreover, this is vastly suboptimal, because the risk > of local minima. > > Something similar was reported many years ago, see > https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was > then suggested that R's behaviour was correct. I'm not familiar with such > an early R version, but the up-to-date documentation of kmeans clearly > states that "Except for the Lloyd-Forgy method, k clusters will always be > returned if a number is specified.". > I am using the default Hartigan-Wong, and I specify an exact number k: > thus, k clusters should be returned. They aren't, and the empty cluster is > then more likely the symptom of a bug rather than the outcome of a 'true' > local minimum. > > Using synaptic, I managed to downgrade R to version 2.13.2. The problem > disappeard, i.e. the previous message/segfault didn't occur anymore. > > Summarizing: given the same dataset, either an unreasonable message or a > segfault regularly happen in version 2.15.2 by invoking kmeans() on an > Ubuntu 11.10 64bit machine. This does not happen at all in previous > versions of R, on the same machine and operating system. > > I respectfully suggest that the behaviour shown in the aforementioned > versions 2.13.2 and 2.14.1 should be considered 'normal', and that version > 2.15.2 should revert to that. > > Kind regards, > Luca Nanetti. > > [[alternative HTML version deleted]]
Uwe Ligges
2013-Mar-13 19:48 UTC
[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
On 13.03.2013 13:45, Dr. Detlef Groth wrote:> Hello, > > here is a working reproducible example which crashes R using kmeans or > gives empty clusters using the nstart option with R 15.2. > > > library(cluster) > kmeans(ruspini,4) > kmeans(ruspini,4,nstart=2) > kmeans(ruspini,4,nstart=4) > kmeans(ruspini,4,nstart=10) > ?kmeans > > either we got empty always clusters and or, after some further commands > an segfault.Yes, thanks, I can reproduce it in 2.15.3, but not in R-prerelease. Maybe this is a side effect of a bug already fixed in R-prerelease. Since R-2.15.3 is frozen now, please upgrade to R-prerelease to become R-3.0.0 in April. Best, Uwe Ligges> > regards, > Detlef Groth > > ------------ > > > [R] Empty cluster / segfault using vanilla kmeans with version 2.15.2 > Uwe Ligges ligges at statistik.tu-dortmund.de > Sat Feb 9 20:52:19 CET 2013 > > Previous message: [R] Empty cluster / segfault using vanilla kmeans > with version 2.15.2 > Next message: [R] Fractional logit in GLM? > Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > > We need a reproducible example. > > Uwe Ligges > > > On 03.02.2013 15:03, Luca Nanetti wrote: >> Dear experts, >> I am encountering a version-dependent issue. >> >> My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained >> below >> never occurred with this version of R >> My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to >> this >> setup. >> >> The data I'm clustering is constituted by the rows of a 320 x 6 matrix >> containing integers ranging from 1 to 7, no missing data. >> I applied kmeans() to this matrix, literally, 256 x 10??? times using R >> version 2.13.2 or 2.14.1, without never experiencing the slightest >> problem. >> My usual setup is with k=5, nstart=256, iter.max=50. >> >> Upgrading to R 2.15.2, I experienced either a warning message ('Empty >> cluster. Choose a better set of initial centers') or a catastrophic >> segfault. The only way I can get a solution whatsoever is putting >> nstart to >> its default value, i.e. 1. However, just repeating the clustering, the >> same >> issue still happen. Moreover, this is vastly suboptimal, because the risk >> of local minima. >> >> Something similar was reported many years ago, see >> https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was >> then suggested that R's behaviour was correct. I'm not familiar with such >> an early R version, but the up-to-date documentation of kmeans clearly >> states that "Except for the Lloyd-Forgy method, k clusters will always be >> returned if a number is specified.". >> I am using the default Hartigan-Wong, and I specify an exact number k: >> thus, k clusters should be returned. They aren't, and the empty >> cluster is >> then more likely the symptom of a bug rather than the outcome of a 'true' >> local minimum. >> >> Using synaptic, I managed to downgrade R to version 2.13.2. The problem >> disappeard, i.e. the previous message/segfault didn't occur anymore. >> >> Summarizing: given the same dataset, either an unreasonable message or a >> segfault regularly happen in version 2.15.2 by invoking kmeans() on an >> Ubuntu 11.10 64bit machine. This does not happen at all in previous >> versions of R, on the same machine and operating system. >> >> I respectfully suggest that the behaviour shown in the aforementioned >> versions 2.13.2 and 2.14.1 should be considered 'normal', and that >> version >> 2.15.2 should revert to that. >> >> Kind regards, >> Luca Nanetti. >> >> [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >