I am running normalmixEM: mixmdlscaled <- normalmixEM(data$FCWg) summary(mixmdlscaled) plot(mixmdlscaled,which=2) If I run the program multiple times, I get widely different results:> mixmdlscaled <- normalmixEM(data$FCWg)number of iterations= 41> summary(mixmdlscaled)summary of normalmixEM object: comp 1 comp 2 lambda 0.0818928 0.918107 mu 0.6575938 0.740870 sigma 0.0070562 0.178410 loglik at estimate: 56.87445> plot(mixmdlscaled,which=2) > mixmdlscaled <- normalmixEM(data$FCWg)number of iterations= 357> summary(mixmdlscaled)summary of normalmixEM object: comp 1 comp 2 lambda 0.959912 0.0400879 mu 0.722022 1.0220719 sigma 0.165454 0.0131391 loglik at estimate: 53.66051> plot(mixmdlscaled,which=2)I understand that when run without specifying various parameters (e.g. mu, or sigma) values are chosen randomly from a normal distribution with center(s) determined from binning the data. Despite this, would not one expect the results to be similar? If one is not to expect similar results, how can I get a solution in which I can have confidence? Should I run the program multiple times and take the average of the results? Should I look for the solution with the best log likelihood? Thank you, John John David Sorkin M.D., Ph.D. Professor of Medicine Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
You could start by sorting the components, by lambda (size) or by mu (mean) since, if you don't supply starting values, the order of the components is random. You could use the following to sort normalmixEM's output: sort.mixEM <- function (x, decreasing = FALSE, ..., by = "lambda") { stopifnot(inherits(x, "mixEM"), is.element(by, names(x))) o <- order(x[[by]], decreasing = decreasing) x$lambda <- x$lambda[o] x$sigma <- x$sigma[o] x$mu <- x$mu[o] x$posterior[] <- x$posterior[, o, drop = FALSE] x } Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Jan 27, 2016 at 8:51 AM, John Sorkin <JSorkin at grecc.umaryland.edu> wrote:> I am running normalmixEM: > mixmdlscaled <- normalmixEM(data$FCWg) > summary(mixmdlscaled) > plot(mixmdlscaled,which=2) > > If I run the program multiple times, I get widely different results: > > > mixmdlscaled <- normalmixEM(data$FCWg) > number of iterations= 41 > > summary(mixmdlscaled) > summary of normalmixEM object: > comp 1 comp 2 > lambda 0.0818928 0.918107 > mu 0.6575938 0.740870 > sigma 0.0070562 0.178410 > loglik at estimate: 56.87445 > > plot(mixmdlscaled,which=2) > > mixmdlscaled <- normalmixEM(data$FCWg) > number of iterations= 357 > > summary(mixmdlscaled) > summary of normalmixEM object: > comp 1 comp 2 > lambda 0.959912 0.0400879 > mu 0.722022 1.0220719 > sigma 0.165454 0.0131391 > loglik at estimate: 53.66051 > > plot(mixmdlscaled,which=2) > > > > I understand that when run without specifying various parameters (e.g. mu, > or sigma) values are chosen randomly from a normal distribution with > center(s) determined from binning the data. Despite this, would not one > expect the results to be similar? If one is not to expect similar results, > how can I get a solution in which I can have confidence? Should I run the > program multiple times and take the average of the results? Should I look > for the solution with the best log likelihood? > > Thank you, > John > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and > Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > Confidentiality Statement: > This email message, including any attachments, is for ...{{dropped:16}}
On Wed, 27 Jan 2016 11:51:07 -0500 John Sorkin <JSorkin at grecc.umaryland.edu> wrote:> I am running normalmixEM: > mixmdlscaled <- normalmixEM(data$FCWg) > summary(mixmdlscaled) > plot(mixmdlscaled,which=2) > > If I run the program multiple times, I get widely different results: > > > mixmdlscaled <- normalmixEM(data$FCWg) > number of iterations= 41 > > summary(mixmdlscaled) > summary of normalmixEM object: > comp 1 comp 2 > lambda 0.0818928 0.918107 > mu 0.6575938 0.740870 > sigma 0.0070562 0.178410 > loglik at estimate: 56.87445 > > plot(mixmdlscaled,which=2) > > mixmdlscaled <- normalmixEM(data$FCWg) > number of iterations= 357 > > summary(mixmdlscaled) > summary of normalmixEM object: > comp 1 comp 2 > lambda 0.959912 0.0400879 > mu 0.722022 1.0220719 > sigma 0.165454 0.0131391 > loglik at estimate: 53.66051 > > plot(mixmdlscaled,which=2) > > > > I understand that when run without specifying various parameters (e.g. mu, or sigma) values are chosen randomly from a normal distribution with center(s) determined from binning the data.I don't know what this means or what the mechanics are.> Despite this, would not one expect the results to be similar? If one is not to expect similar results, how can I get a solution in which I can have confidence? Should I run the program multiple times and take the average of the results? Should I look for the solution with the best log likelihood?But if a likelihood has several local maxima wrt its parameters, isn't this what you would expect? I don't know how familiar you are with statistics so maybe I am repeating something that you already know, but a MLE (note the indefinite article) is what is found by the EM or any iterative/root-finding method in the vicinity of its initialization. Your best best is to use a package such as EMCluster. If you want to use the above package, you should make several runs and then choose the one which gives a stable solution and the highest loglikelihood value. EMCluster does it for you. HTH! Best wishes, Ranjan> Thank you, > John > John David Sorkin M.D., Ph.D. > Professor of Medicine > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > Confidentiality Statement: > This email message, including any attachments, is for ...{{dropped:26}}