Hi All,
I have a n x m matrix. The n rows are individuals, the m columns are variables.
The matrix is in itself a collection of 1s (if a variable is observed for an
individual), and 0s (is there is no observation).
Something like:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 0 1 1 0 0
[2,] 1 0 1 1 0 0
[3,] 1 0 1 1 0 0
[4,] 0 1 0 0 0 0
[5,] 1 0 1 1 0 0
[6,] 0 1 0 0 1 0
I use kmeans to find 2 or 3 clusters in this matrix
k2 = kmeans(data, 2, 10000000)
k3 = kmeans(data, 3, 10000000)
but I would like to use something a bit more refined, so I though about a EM
based clustering. I am using the Mclust() function from the mclust package, but
I get the following (to me incomprehensible) error message:
plot(Mclust(as.data.frame(data)), as.data.frame(data))
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Error in 1:L : NA/NaN argument
In addition: Warning messages:
1: best model occurs at the min or max # of components considered in:
summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
2: optimal number of clusters occurs at min choice in:
Mclust(as.data.frame(anc.st.mat))
3: insufficient input for specified plot in: coordProj(data = data, parameters =
x$parameters, z = x$z, what = "classification",
That's puzzling because the example given by ?Mclust is something like
plot(Mclust(iris[,-5]), iris[,-5])
which is pretty simple and dumbproof and works flawlessly...
best,
Federico
--
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG
Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com
you could also have a look at function lca() from package `e1071' that
performs a latent class analysis, e.g.,
fit1 <- lca(data, 2)
fit1
fit2 <- lca(data, 3)
fit2
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "Federico Calboli" <f.calboli at imperial.ac.uk>
To: "r-help" <r-help at stat.math.ethz.ch>
Sent: Wednesday, July 18, 2007 3:37 PM
Subject: [R] EM unsupervised clustering
> Hi All,
>
> I have a n x m matrix. The n rows are individuals, the m columns
> are variables.
>
> The matrix is in itself a collection of 1s (if a variable is
> observed for an
> individual), and 0s (is there is no observation).
>
> Something like:
>
> [,1] [,2] [,3] [,4] [,5] [,6]
> [1,] 1 0 1 1 0 0
> [2,] 1 0 1 1 0 0
> [3,] 1 0 1 1 0 0
> [4,] 0 1 0 0 0 0
> [5,] 1 0 1 1 0 0
> [6,] 0 1 0 0 1 0
>
>
> I use kmeans to find 2 or 3 clusters in this matrix
>
> k2 = kmeans(data, 2, 10000000)
> k3 = kmeans(data, 3, 10000000)
>
> but I would like to use something a bit more refined, so I though
> about a EM
> based clustering. I am using the Mclust() function from the mclust
> package, but
> I get the following (to me incomprehensible) error message:
>
> plot(Mclust(as.data.frame(data)), as.data.frame(data))
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Error in 1:L : NA/NaN argument
> In addition: Warning messages:
> 1: best model occurs at the min or max # of components considered
> in:
> summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
> 2: optimal number of clusters occurs at min choice in:
> Mclust(as.data.frame(anc.st.mat))
> 3: insufficient input for specified plot in: coordProj(data = data,
> parameters > x$parameters, z = x$z, what = "classification",
>
> That's puzzling because the example given by ?Mclust is something
> like
>
> plot(Mclust(iris[,-5]), iris[,-5])
>
> which is pretty simple and dumbproof and works flawlessly...
>
> best,
>
> Federico
>
> --
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
>
> Tel +44 (0)20 7594 1602 Fax (+44) 020 7594 3193
>
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
Federico, you might also want to have a look at packages "flexclust" or "flexmix", so you can take into account that you have binary data. The "mclust" package can be used to estimate mixtures of Gaussian distributions. "flexclust" implements kmeans-like algorithms, but you can specify a distance measure appropriate for binary data. "flexmix" allows latent class analysis with binary data using FLXMCmvbinary() for the component specific model. Best, Bettina Federico Calboli wrote:> Hi All, > > I have a n x m matrix. The n rows are individuals, the m columns are variables. > > The matrix is in itself a collection of 1s (if a variable is observed for an > individual), and 0s (is there is no observation). > > Something like: > > [,1] [,2] [,3] [,4] [,5] [,6] > [1,] 1 0 1 1 0 0 > [2,] 1 0 1 1 0 0 > [3,] 1 0 1 1 0 0 > [4,] 0 1 0 0 0 0 > [5,] 1 0 1 1 0 0 > [6,] 0 1 0 0 1 0 > > > I use kmeans to find 2 or 3 clusters in this matrix > > k2 = kmeans(data, 2, 10000000) > k3 = kmeans(data, 3, 10000000) > > but I would like to use something a bit more refined, so I though about a EM > based clustering. I am using the Mclust() function from the mclust package, but > I get the following (to me incomprehensible) error message: > > plot(Mclust(as.data.frame(data)), as.data.frame(data)) > Hit <Return> to see next plot: > Hit <Return> to see next plot: > Hit <Return> to see next plot: > Error in 1:L : NA/NaN argument > In addition: Warning messages: > 1: best model occurs at the min or max # of components considered in: > summary.mclustBIC(Bic, data, G = G, modelNames = modelNames) > 2: optimal number of clusters occurs at min choice in: > Mclust(as.data.frame(anc.st.mat)) > 3: insufficient input for specified plot in: coordProj(data = data, parameters = > x$parameters, z = x$z, what = "classification", > > That's puzzling because the example given by ?Mclust is something like > > plot(Mclust(iris[,-5]), iris[,-5]) > > which is pretty simple and dumbproof and works flawlessly... > > best, > > Federico >
Possibly Parallel Threads
- What is mclust up to? Different clusters found if x and y interchanged
- Mclust problem with mclust1Dplot: Error in to - from : non-numeric argument to binary operator
- mclust: modelNames("E") vs modelNames=("V")
- positive log likelihood and BIC values from mCLUST analysis
- mclust: modelName="E" vs modelName="V"