Hi All, I have a n x m matrix. The n rows are individuals, the m columns are variables. The matrix is in itself a collection of 1s (if a variable is observed for an individual), and 0s (is there is no observation). Something like: [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 0 1 1 0 0 [2,] 1 0 1 1 0 0 [3,] 1 0 1 1 0 0 [4,] 0 1 0 0 0 0 [5,] 1 0 1 1 0 0 [6,] 0 1 0 0 1 0 I use kmeans to find 2 or 3 clusters in this matrix k2 = kmeans(data, 2, 10000000) k3 = kmeans(data, 3, 10000000) but I would like to use something a bit more refined, so I though about a EM based clustering. I am using the Mclust() function from the mclust package, but I get the following (to me incomprehensible) error message: plot(Mclust(as.data.frame(data)), as.data.frame(data)) Hit <Return> to see next plot: Hit <Return> to see next plot: Hit <Return> to see next plot: Error in 1:L : NA/NaN argument In addition: Warning messages: 1: best model occurs at the min or max # of components considered in: summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) 2: optimal number of clusters occurs at min choice in: Mclust(as.data.frame(anc.st.mat)) 3: insufficient input for specified plot in: coordProj(data = data, parameters = x$parameters, z = x$z, what = "classification", That's puzzling because the example given by ?Mclust is something like plot(Mclust(iris[,-5]), iris[,-5]) which is pretty simple and dumbproof and works flawlessly... best, Federico -- Federico C. F. Calboli Department of Epidemiology and Public Health Imperial College, St Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com
you could also have a look at function lca() from package `e1071' that performs a latent class analysis, e.g., fit1 <- lca(data, 2) fit1 fit2 <- lca(data, 3) fit2 I hope it helps. Best, Dimitris ---- Dimitris Rizopoulos Ph.D. Student Biostatistical Centre School of Public Health Catholic University of Leuven Address: Kapucijnenvoer 35, Leuven, Belgium Tel: +32/(0)16/336899 Fax: +32/(0)16/337015 Web: med.kuleuven.be/biostat student.kuleuven.be/~m0390867/dimitris.htm ----- Original Message ----- From: "Federico Calboli" <f.calboli at imperial.ac.uk> To: "r-help" <r-help at stat.math.ethz.ch> Sent: Wednesday, July 18, 2007 3:37 PM Subject: [R] EM unsupervised clustering> Hi All, > > I have a n x m matrix. The n rows are individuals, the m columns > are variables. > > The matrix is in itself a collection of 1s (if a variable is > observed for an > individual), and 0s (is there is no observation). > > Something like: > > [,1] [,2] [,3] [,4] [,5] [,6] > [1,] 1 0 1 1 0 0 > [2,] 1 0 1 1 0 0 > [3,] 1 0 1 1 0 0 > [4,] 0 1 0 0 0 0 > [5,] 1 0 1 1 0 0 > [6,] 0 1 0 0 1 0 > > > I use kmeans to find 2 or 3 clusters in this matrix > > k2 = kmeans(data, 2, 10000000) > k3 = kmeans(data, 3, 10000000) > > but I would like to use something a bit more refined, so I though > about a EM > based clustering. I am using the Mclust() function from the mclust > package, but > I get the following (to me incomprehensible) error message: > > plot(Mclust(as.data.frame(data)), as.data.frame(data)) > Hit <Return> to see next plot: > Hit <Return> to see next plot: > Hit <Return> to see next plot: > Error in 1:L : NA/NaN argument > In addition: Warning messages: > 1: best model occurs at the min or max # of components considered > in: > summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) > 2: optimal number of clusters occurs at min choice in: > Mclust(as.data.frame(anc.st.mat)) > 3: insufficient input for specified plot in: coordProj(data = data, > parameters > x$parameters, z = x$z, what = "classification", > > That's puzzling because the example given by ?Mclust is something > like > > plot(Mclust(iris[,-5]), iris[,-5]) > > which is pretty simple and dumbproof and works flawlessly... > > best, > > Federico > > -- > Federico C. F. Calboli > Department of Epidemiology and Public Health > Imperial College, St Mary's Campus > Norfolk Place, London W2 1PG > > Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193 > > f.calboli [.a.t] imperial.ac.uk > f.calboli [.a.t] gmail.com > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Disclaimer: kuleuven.be/cwis/email_disclaimer.htm
Federico, you might also want to have a look at packages "flexclust" or "flexmix", so you can take into account that you have binary data. The "mclust" package can be used to estimate mixtures of Gaussian distributions. "flexclust" implements kmeans-like algorithms, but you can specify a distance measure appropriate for binary data. "flexmix" allows latent class analysis with binary data using FLXMCmvbinary() for the component specific model. Best, Bettina Federico Calboli wrote:> Hi All, > > I have a n x m matrix. The n rows are individuals, the m columns are variables. > > The matrix is in itself a collection of 1s (if a variable is observed for an > individual), and 0s (is there is no observation). > > Something like: > > [,1] [,2] [,3] [,4] [,5] [,6] > [1,] 1 0 1 1 0 0 > [2,] 1 0 1 1 0 0 > [3,] 1 0 1 1 0 0 > [4,] 0 1 0 0 0 0 > [5,] 1 0 1 1 0 0 > [6,] 0 1 0 0 1 0 > > > I use kmeans to find 2 or 3 clusters in this matrix > > k2 = kmeans(data, 2, 10000000) > k3 = kmeans(data, 3, 10000000) > > but I would like to use something a bit more refined, so I though about a EM > based clustering. I am using the Mclust() function from the mclust package, but > I get the following (to me incomprehensible) error message: > > plot(Mclust(as.data.frame(data)), as.data.frame(data)) > Hit <Return> to see next plot: > Hit <Return> to see next plot: > Hit <Return> to see next plot: > Error in 1:L : NA/NaN argument > In addition: Warning messages: > 1: best model occurs at the min or max # of components considered in: > summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) > 2: optimal number of clusters occurs at min choice in: > Mclust(as.data.frame(anc.st.mat)) > 3: insufficient input for specified plot in: coordProj(data = data, parameters = > x$parameters, z = x$z, what = "classification", > > That's puzzling because the example given by ?Mclust is something like > > plot(Mclust(iris[,-5]), iris[,-5]) > > which is pretty simple and dumbproof and works flawlessly... > > best, > > Federico >
Reasonably Related Threads
- What is mclust up to? Different clusters found if x and y interchanged
- Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator
- mclust: modelNames("E") vs modelNames=("V")
- positive log likelihood and BIC values from mCLUST analysis
- mclust: modelName="E" vs modelName="V"