thr3ads.net - R help - [R] EM unsupervised clustering [Jul 2007]

If this information is useful, please help other people find it:
Share via:

Federico Calboli

2007-Jul-18 13:37 UTC

[R] EM unsupervised clustering

Hi All,

I have a  n x m matrix. The n rows are individuals, the m columns are variables.

The matrix is in itself a collection of 1s (if a variable is observed for an 
individual), and 0s (is there is no observation).

Something like:

      [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    0    1    1    0    0
[2,]    1    0    1    1    0    0
[3,]    1    0    1    1    0    0
[4,]    0    1    0    0    0    0
[5,]    1    0    1    1    0    0
[6,]    0    1    0    0    1    0


I use kmeans to find 2 or 3 clusters in this matrix

k2 = kmeans(data, 2, 10000000)
k3 = kmeans(data, 3, 10000000)

but I would like to use something a bit more refined, so I though about a EM 
based clustering. I am using the Mclust() function from the mclust package, but 
I get the following (to me incomprehensible) error message:

plot(Mclust(as.data.frame(data)), as.data.frame(data))
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Hit <Return> to see next plot:
Error in 1:L : NA/NaN argument
In addition: Warning messages:
1: best model occurs at the min or max # of components considered in: 
summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
2: optimal number of clusters occurs at min choice in: 
Mclust(as.data.frame(anc.st.mat))
3: insufficient input for specified plot in: coordProj(data = data, parameters =
x$parameters, z = x$z, what = "classification",

That's puzzling because the example given by ?Mclust is something like

plot(Mclust(iris[,-5]), iris[,-5])

which is pretty simple and dumbproof and works flawlessly...

best,

Federico

-- 
Federico C. F. Calboli
Department of Epidemiology and Public Health
Imperial College, St Mary's Campus
Norfolk Place, London W2 1PG

Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193

f.calboli [.a.t] imperial.ac.uk
f.calboli [.a.t] gmail.com

Dimitris Rizopoulos

2007-Jul-18 13:48 UTC

head link

[R] EM unsupervised clustering

you could also have a look at function lca() from package `e1071' that 
performs a latent class analysis, e.g.,

fit1 <- lca(data, 2)
fit1

fit2 <- lca(data, 3)
fit2

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: "Federico Calboli" <f.calboli at imperial.ac.uk>
To: "r-help" <r-help at stat.math.ethz.ch>
Sent: Wednesday, July 18, 2007 3:37 PM
Subject: [R] EM unsupervised clustering

> Hi All,
>
> I have a  n x m matrix. The n rows are individuals, the m columns 
> are variables.
>
> The matrix is in itself a collection of 1s (if a variable is 
> observed for an
> individual), and 0s (is there is no observation).
>
> Something like:
>
>      [,1] [,2] [,3] [,4] [,5] [,6]
> [1,]    1    0    1    1    0    0
> [2,]    1    0    1    1    0    0
> [3,]    1    0    1    1    0    0
> [4,]    0    1    0    0    0    0
> [5,]    1    0    1    1    0    0
> [6,]    0    1    0    0    1    0
>
>
> I use kmeans to find 2 or 3 clusters in this matrix
>
> k2 = kmeans(data, 2, 10000000)
> k3 = kmeans(data, 3, 10000000)
>
> but I would like to use something a bit more refined, so I though 
> about a EM
> based clustering. I am using the Mclust() function from the mclust 
> package, but
> I get the following (to me incomprehensible) error message:
>
> plot(Mclust(as.data.frame(data)), as.data.frame(data))
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Error in 1:L : NA/NaN argument
> In addition: Warning messages:
> 1: best model occurs at the min or max # of components considered 
> in:
> summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
> 2: optimal number of clusters occurs at min choice in:
> Mclust(as.data.frame(anc.st.mat))
> 3: insufficient input for specified plot in: coordProj(data = data, 
> parameters > x$parameters, z = x$z, what = "classification",
>
> That's puzzling because the example given by ?Mclust is something 
> like
>
> plot(Mclust(iris[,-5]), iris[,-5])
>
> which is pretty simple and dumbproof and works flawlessly...
>
> best,
>
> Federico
>
> -- 
> Federico C. F. Calboli
> Department of Epidemiology and Public Health
> Imperial College, St Mary's Campus
> Norfolk Place, London W2 1PG
>
> Tel  +44 (0)20 7594 1602     Fax (+44) 020 7594 3193
>
> f.calboli [.a.t] imperial.ac.uk
> f.calboli [.a.t] gmail.com
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Bettina Gruen

2007-Jul-19 06:14 UTC

head link

[R] EM unsupervised clustering

Federico,

you might also want to have a look at packages "flexclust" or
"flexmix",
so you can take into account that you have binary data. The "mclust" 
package can be used to estimate mixtures of Gaussian distributions. 
"flexclust" implements kmeans-like algorithms, but you can specify a 
distance measure appropriate for binary data. "flexmix" allows latent 
class analysis with binary data using FLXMCmvbinary() for the component 
specific model.

Best,
Bettina


Federico Calboli wrote:> Hi All,
> 
> I have a  n x m matrix. The n rows are individuals, the m columns are
variables.
> 
> The matrix is in itself a collection of 1s (if a variable is observed for
an
> individual), and 0s (is there is no observation).
> 
> Something like:
> 
>       [,1] [,2] [,3] [,4] [,5] [,6]
> [1,]    1    0    1    1    0    0
> [2,]    1    0    1    1    0    0
> [3,]    1    0    1    1    0    0
> [4,]    0    1    0    0    0    0
> [5,]    1    0    1    1    0    0
> [6,]    0    1    0    0    1    0
> 
> 
> I use kmeans to find 2 or 3 clusters in this matrix
> 
> k2 = kmeans(data, 2, 10000000)
> k3 = kmeans(data, 3, 10000000)
> 
> but I would like to use something a bit more refined, so I though about a
EM
> based clustering. I am using the Mclust() function from the mclust package,
but
> I get the following (to me incomprehensible) error message:
> 
> plot(Mclust(as.data.frame(data)), as.data.frame(data))
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Hit <Return> to see next plot:
> Error in 1:L : NA/NaN argument
> In addition: Warning messages:
> 1: best model occurs at the min or max # of components considered in: 
> summary.mclustBIC(Bic, data, G = G, modelNames = modelNames)
> 2: optimal number of clusters occurs at min choice in: 
> Mclust(as.data.frame(anc.st.mat))
> 3: insufficient input for specified plot in: coordProj(data = data,
parameters =
> x$parameters, z = x$z, what = "classification",
> 
> That's puzzling because the example given by ?Mclust is something like
> 
> plot(Mclust(iris[,-5]), iris[,-5])
> 
> which is pretty simple and dumbproof and works flawlessly...
> 
> best,
> 
> Federico
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Jul 2007 - EM unsupervised clustering

[R] EM unsupervised clustering

[R] EM unsupervised clustering

[R] EM unsupervised clustering

Apparently Analagous Threads