Hello everyone. I have a few MCLUST questions and I was hoping someone could help me out. If you’re an MCLUST user, they will likely be pretty easy to answer. Thanks in advance for any help. Ken What are the pros/cons of starting a finite mixture model at the “m” step versus the “e” step (where “m” is the maximization step and “e” is the expectation step of the EM algorithm)? In particular, are there any reasons for using em(modelName=XXX) versus me(modelName=XXX). Other than MCLUST, I’ve not seen a finite mixture model “program” give such an option. Would it make sense to fit both models and take the one with the largest log likelihood? Rather than the hc() function performing cluster analysis for all of G possible clusters, can it be set to only perform a specified number (e.g., set so G=2 only). Although a minimum number of clusters can be specified, there doesn’t seem to be any way to limit the number of clusters. I want to do a simulation for a fixed number of components, and thus I would like to avoid the unnecessary computations. Is there any difference between hc(modelName=VVV) and hcVVV or hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If not, why do the same functions have different names? --------------------------------- [[alternative HTML version deleted]]
I can answer for MCLUST specifically, but in general mixture modelling terms it is easier to think of a reasonable initial clustering of the data from which the M step will quickly produce initial parameter estimates, than to pick a large number of initial parameters values out of the air. (Perhaps you may use a random grouping to start things off if nothing else comes to mind.) Usually if you try to do this you will pick parameters that make some data values very improbable leading to numerical difficulties in the M-step. On the other hand you may have a good set of parameter values from a previously-fitted data set and you have a new, but similar set of data, perhaps from a different time-period or location. Then it will make sense to start off from the parameter values that you have. Don't worry about the software - it should be just as easy for it to begin at either the E- or the M- step - it is you own intentions and convenience that matter. Murray Jorgensen KKThird at Yahoo.Com wrote:> Hello everyone. I have a few MCLUST questions and I was hoping someone could help me out. If you?re an MCLUST user, they will likely be pretty easy to answer. Thanks in advance for any help. > > Ken > > > > What are the pros/cons of starting a finite mixture model at the "m" step versus the "e" step (where "m" is the maximization step and "e" is the expectation step of the EM algorithm)? In particular, are there any reasons for using em(modelName=XXX) versus me(modelName=XXX). Other than MCLUST, I?ve not seen a finite mixture model "program" give such an option. Would it make sense to fit both models and take the one with the largest log likelihood? > > > > Rather than the hc() function performing cluster analysis for all of G possible clusters, can it be set to only perform a specified number (e.g., set so G=2 only). Although a minimum number of clusters can be specified, there doesn?t seem to be any way to limit the number of clusters. I want to do a simulation for a fixed number of components, and thus I would like to avoid the unnecessary computations. > > > > Is there any difference between hc(modelName=VVV) and hcVVV or hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If not, why do the same functions have different names? > > > > --------------------------------- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > >-- Dr Murray Jorgensen http://www.stats.waikato.ac.nz/Staff/maj.html Department of Statistics, University of Waikato, Hamilton, New Zealand Email: maj at waikato.ac.nz Fax 7 838 4155 Phone +64 7 838 4773 wk +64 7 849 6486 home Mobile 021 1395 862
Hi Ken, 1) me starting with a partition converges toward the same result as em starting with the parameters associated with the partition. So there is no point in doing both and see which one is better. 2) hc is an agglomerative hierarchical method. This means, it starts with n clusters and reduces the number of clusters by 1 in every step. That is, if you want to compute the solution for G=2 clusters, you *have to* compute n, n-1, n-2,..., G+1 clusters first. By definition, it's not possible to calculate 2, but not more clusters. Christian On Sun, 13 Jun 2004, KKThird at Yahoo.Com wrote:> > Hello everyone. I have a few MCLUST questions and I was hoping someone could help me out. If you?re an MCLUST user, they will likely be pretty easy to answer. Thanks in advance for any help. > > Ken > > > > What are the pros/cons of starting a finite mixture model at the "m" step versus the "e" step (where "m" is the maximization step and "e" is the expectation step of the EM algorithm)? In particular, are there any reasons for using em(modelName=XXX) versus me(modelName=XXX). Other than MCLUST, I?ve not seen a finite mixture model "program" give such an option. Would it make sense to fit both models and take the one with the largest log likelihood? > > > > Rather than the hc() function performing cluster analysis for all of G possible clusters, can it be set to only perform a specified number (e.g., set so G=2 only). Although a minimum number of clusters can be specified, there doesn?t seem to be any way to limit the number of clusters. I want to do a simulation for a fixed number of components, and thus I would like to avoid the unnecessary computations. > > > > Is there any difference between hc(modelName=VVV) and hcVVV or hc(modelName=EEE) and hcEEE, etc.? Likewise, are there any differences between mstep(modelName=VVV) and mstepVVV or mstep(modelName=EEE) and mstepEEE, etc. If not, why do the same functions have different names? > > > > --------------------------------- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >*********************************************************************** Christian Hennig Fachbereich Mathematik-SPST/ZMS, Universitaet Hamburg hennig at math.uni-hamburg.de, http://www.math.uni-hamburg.de/home/hennig/ ####################################################################### ich empfehle www.boag-online.de