thr3ads.net - R help - [R] A Few MCLUST Questions [Jun 2004]

If this information is useful, please help other people find it:
Share via:

KKThird@Yahoo.Com

2004-Jun-14 02:29 UTC

[R] A Few MCLUST Questions

Hello everyone. I have a few MCLUST questions and I was hoping someone could
help me out. If you’re an MCLUST user, they will likely be pretty easy to
answer. Thanks in advance for any help.

Ken 

 

   What are the pros/cons of starting a finite mixture model at the “m” step
versus the “e” step (where “m” is the maximization step and “e” is the
expectation step of the EM algorithm)? In particular, are there any reasons for
using em(modelName=XXX) versus me(modelName=XXX). Other than MCLUST, I’ve not
seen a finite mixture model “program” give such an option. Would it make sense
to fit both models and take the one with the largest log likelihood?

 

   Rather than the hc() function performing cluster analysis for all of G
possible clusters, can it be set to only perform a specified number (e.g., set
so G=2 only). Although a minimum number of clusters can be specified, there
doesn’t seem to be any way to limit the number of clusters. I want to do a
simulation for a fixed number of components, and thus I would like to avoid the
unnecessary computations.

 

   Is there any difference between hc(modelName=VVV) and hcVVV or
hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between
mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If
not, why do the same functions have different names?


		
---------------------------------


	[[alternative HTML version deleted]]

Murray Jorgensen

2004-Jun-14 02:49 UTC

head link

[R] A Few MCLUST Questions

I can answer for MCLUST specifically, but in general mixture modelling 
terms it is easier to think of a reasonable initial clustering of the 
data from which the M step will quickly produce initial parameter 
estimates, than to pick a large number of initial parameters values out 
of the air. (Perhaps you may use a random grouping to start things off 
if nothing else comes to mind.) Usually if you try to do this you will 
pick parameters that make some data values very improbable leading to 
numerical difficulties in the M-step.

On the other hand you may have a good set of parameter values from a 
previously-fitted data set and you have a new, but similar set of data, 
perhaps from a different time-period or location. Then it will make 
sense to start off from the parameter values that you have.

Don't worry about the software - it should be just as easy for it to 
begin at either the E- or the M- step - it is you own intentions and 
convenience that matter.

Murray Jorgensen

KKThird at Yahoo.Com wrote:> Hello everyone. I have a few MCLUST questions and I was hoping someone
could help me out. If you?re an MCLUST user, they will likely be pretty easy to
answer. Thanks in advance for any help.
> 
> Ken 
> 
>  
> 
>    What are the pros/cons of starting a finite mixture model at the
"m" step versus the "e" step (where "m" is the
maximization step and "e" is the expectation step of the EM
algorithm)? In particular, are there any reasons for using em(modelName=XXX)
versus me(modelName=XXX). Other than MCLUST, I?ve not seen a finite mixture
model "program" give such an option. Would it make sense to fit both
models and take the one with the largest log likelihood?
> 
>  
> 
>    Rather than the hc() function performing cluster analysis for all of G
possible clusters, can it be set to only perform a specified number (e.g., set
so G=2 only). Although a minimum number of clusters can be specified, there
doesn?t seem to be any way to limit the number of clusters. I want to do a
simulation for a fixed number of components, and thus I would like to avoid the
unnecessary computations.
> 
>  
> 
>    Is there any difference between hc(modelName=VVV) and hcVVV or
hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between
mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If
not, why do the same functions have different names?
> 
> 
> 		
> ---------------------------------
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
> 
-- 
Dr Murray Jorgensen      http://www.stats.waikato.ac.nz/Staff/maj.html
Department of Statistics, University of Waikato, Hamilton, New Zealand
Email: maj at waikato.ac.nz                                Fax 7 838 4155
Phone  +64 7 838 4773 wk    +64 7 849 6486 home    Mobile 021 1395 862

Christian Hennig

2004-Jun-14 16:27 UTC

head link

[R] A Few MCLUST Questions

Hi Ken,

1) me starting with a partition converges toward the same result as
em starting with the parameters associated with the partition. So there is
no point in doing both and see which one is better. 

2) hc is an agglomerative hierarchical method. This means, it starts with
n clusters and reduces the number of clusters by 1 in every step. That is,
if you want to compute the solution for G=2 clusters, you *have to*
compute n, n-1, n-2,..., G+1 clusters first. By definition, it's not
possible to calculate 2, but not more clusters.

Christian

On Sun, 13 Jun 2004, KKThird at Yahoo.Com wrote:
> 
> Hello everyone. I have a few MCLUST questions and I was hoping someone
could help me out. If you?re an MCLUST user, they will likely be pretty easy to
answer. Thanks in advance for any help.
> 
> Ken 
> 
>  
> 
>    What are the pros/cons of starting a finite mixture model at the
"m" step versus the "e" step (where "m" is the
maximization step and "e" is the expectation step of the EM
algorithm)? In particular, are there any reasons for using em(modelName=XXX)
versus me(modelName=XXX). Other than MCLUST, I?ve not seen a finite mixture
model "program" give such an option. Would it make sense to fit both
models and take the one with the largest log likelihood?
> 
>  
> 
>    Rather than the hc() function performing cluster analysis for all of G
possible clusters, can it be set to only perform a specified number (e.g., set
so G=2 only). Although a minimum number of clusters can be specified, there
doesn?t seem to be any way to limit the number of clusters. I want to do a
simulation for a fixed number of components, and thus I would like to avoid the
unnecessary computations.
> 
>  
> 
>    Is there any difference between hc(modelName=VVV) and hcVVV or
hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between
mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If
not, why do the same functions have different names?
> 
> 
> 		
> ---------------------------------
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
***********************************************************************
Christian Hennig
Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg
hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/
#######################################################################
ich empfehle www.boag-online.de

Possibly Parallel Threads

Search for more reasonably related threads

R help - Jun 2004 - A Few MCLUST Questions

[R] A Few MCLUST Questions

[R] A Few MCLUST Questions

[R] A Few MCLUST Questions

Possibly Parallel Threads