Bob Sandefur
2001-Aug-28 20:59 UTC
[R] fitting a mixture of distributions with optim and max log likelihood ?
hi Suppose I have a mixture of 2 distributions generated by rtwonormals <- function(npnt,m1,s1,m2,s2,p2){ rv<-vector(npnt,mode="numeric") for( i in seq(1:npnt)){ if(runif(1,0,1)<=p2){ rv[i]<-rnorm(1,m2,s2) } else{ rv[i]<-rnorm(1,m1,s1) } } return(rv) } x <- rtwonormals(50000,0,100,500,500,0.05) #and I try to fit these with (based on thread: [R] Estimating Weibull Distribution Parameters - very basic question) loglike<-function(p) -2*sum(log((1-p[5])*dnorm(x,p[1],p[2])+p[5]*dnorm(x,p[3],p[4]))) optim(c(-20,150,400,600,.035),loglike) optim(c(20,70,600,400,.095),loglike) optim(c(0,100,500,500,.05),loglike) optim(c(-20,150,400,600,.035),loglike) # three different starting values (1 and 4 the same to check reproducablity) I get: Version 1.3.0 (2001-06-22) on windoze XP> optim(c(-20,150,400,600,.035),loglike)$par [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 $value [1] 622843.1 $counts function gradient 493 NA $convergence [1] 0 $message NULL There were 22 warnings (use warnings() to see them)> optim(c(20,70,600,400,.095),loglike)$par [1] 0.62742812 100.15891023 533.25825184 514.63882147 0.04670099 $value [1] 622692.7 $couns function gradient 501 NA $convergence [1] 1 < OPPS $message NULL There were 22 warnings (use warnings() to see them)> optim(c(0,100,500,500,.05),loglike)$par [1] 0.56254342 100.03881574 499.47434961 505.59785487 0.04805347 $value [1] 622685 $counts function gradient 109 NA $convergence [1] 0 $message NULL There were 21 warnings (use warnings() to see them)> optim(c(-20,150,400,600,.035),loglike)$par [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 $value [1] 622843.1 $counts function gradient 493 NA $convergence [1] 0 $message NULL There were 22 warnings (use warnings() to see them) Questions: 1) Did I mess up anything in the formulae? 2) Any suggestions for converging to the same value? 3) Any suggestions for other methods to get means, stddevs and proportions of the mixture of distributions? Thanx Robert (Bob) L Sandefur Principal Geostatistician Pincock Allen & Holt (A Hart Crowser Company) International Minerals Consultants 274 Union Suite 200 Lakewood CO 80228 USA 303 914-4467 v 303 987-8907 f rls at pincock.com ~ ~ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian D Ripley
2001-Aug-28 21:10 UTC
[R] fitting a mixture of distributions with optim and max log likelihood ?
On Tue, 28 Aug 2001, Bob Sandefur wrote:> hi > Suppose I have a mixture of 2 distributions generated byThis is well-known difficult optimization problem with many local maxima. I think in particular that your test example has too broad a second normal for this to work well. The default method for optim() is slow, not very precise but very reliable. I very rarely use it. With a problem like this where derivatives are readily available I would BFGS with analytical derivatives and multiple starting points to check for local maxima. BTw, what are those warnings? You actually have a constrained space for the optimization, and I would have worked with log-sds to avoid that.> > rtwonormals <- function(npnt,m1,s1,m2,s2,p2){ > rv<-vector(npnt,mode="numeric") > for( i in seq(1:npnt)){ > if(runif(1,0,1)<=p2){ > rv[i]<-rnorm(1,m2,s2) > } > else{ > rv[i]<-rnorm(1,m1,s1) > } > } > return(rv) > } > x <- rtwonormals(50000,0,100,500,500,0.05) > > #and I try to fit these with (based on thread: [R] Estimating Weibull Distribution Parameters - very basic question) > > loglike<-function(p) -2*sum(log((1-p[5])*dnorm(x,p[1],p[2])+p[5]*dnorm(x,p[3],p[4]))) > optim(c(-20,150,400,600,.035),loglike) > optim(c(20,70,600,400,.095),loglike) > optim(c(0,100,500,500,.05),loglike) > optim(c(-20,150,400,600,.035),loglike) > > # three different starting values (1 and 4 the same to check reproducablity) I get: > > Version 1.3.0 (2001-06-22) on windoze XPInteresting. I thought XP was to be released on Oct 25, but there were rumours that MicroSoft had changed this. Do you really have XP, or a preview? If this really does run under the real XP, we should add that to the readme etc. Quite a few things, including our Cygwin tools, do not run under XP, allegedly.> > optim(c(-20,150,400,600,.035),loglike) > $par > [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 > $value > [1] 622843.1 > $counts > function gradient > 493 NA > $convergence > [1] 0 > $message > NULL > There were 22 warnings (use warnings() to see them) > > > optim(c(20,70,600,400,.095),loglike) > $par > [1] 0.62742812 100.15891023 533.25825184 514.63882147 0.04670099 > $value > [1] 622692.7 > $couns > function gradient > 501 NA > $convergence > [1] 1 < OPPS > $message > NULL > There were 22 warnings (use warnings() to see them) > > > optim(c(0,100,500,500,.05),loglike) > $par > [1] 0.56254342 100.03881574 499.47434961 505.59785487 0.04805347 > $value > [1] 622685 > $counts > function gradient > 109 NA > $convergence > [1] 0 > $message > NULL > There were 21 warnings (use warnings() to see them) > > > optim(c(-20,150,400,600,.035),loglike) > $par > [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 > $value > [1] 622843.1 > $counts > function gradient > 493 NA > $convergence > [1] 0 > $message > NULL > There were 22 warnings (use warnings() to see them) > > > Questions: > 1) Did I mess up anything in the formulae? > 2) Any suggestions for converging to the same value? > 3) Any suggestions for other methods to get means, stddevs and proportions of the mixture of distributions? > > Thanx > > Robert (Bob) L Sandefur > Principal Geostatistician > Pincock Allen & Holt (A Hart Crowser Company) > International Minerals Consultants > 274 Union Suite 200 > Lakewood CO 80228 > USA > 303 914-4467 v > 303 987-8907 f > rls at pincock.com > > > ~ > ~ > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
ben@zoo.ufl.edu
2001-Aug-28 21:20 UTC
[R] fitting a mixture of distributions with optim and max log likelihood ?
What you're doing seems reasonable, but this is a hard problem in general. My first advice would be to check out MASS (Modern Applied Statistics with S-PLUS), 3d ed., pp. 261-266; this gives methods for doing the problem in S-PLUS, and several references to the literature on mixture models. Many of the functions used in MASS to solve this problem are specific to S-PLUS, but there are equivalents in R (deriv3 <-> D, etc.). The online R complements at http://www.stats.ox.ac.uk/pub/MASS3/Compl.shtml should help you out. Ben Bolker On Tue, 28 Aug 2001, Bob Sandefur wrote:> hi > Suppose I have a mixture of 2 distributions generated by > > rtwonormals <- function(npnt,m1,s1,m2,s2,p2){ > rv<-vector(npnt,mode="numeric") > for( i in seq(1:npnt)){ > if(runif(1,0,1)<=p2){ > rv[i]<-rnorm(1,m2,s2) > } > else{ > rv[i]<-rnorm(1,m1,s1) > } > } > return(rv) > } > x <- rtwonormals(50000,0,100,500,500,0.05) > > #and I try to fit these with (based on thread: [R] Estimating Weibull Distribution Parameters - very basic question) > > loglike<-function(p) -2*sum(log((1-p[5])*dnorm(x,p[1],p[2])+p[5]*dnorm(x,p[3],p[4]))) > optim(c(-20,150,400,600,.035),loglike) > optim(c(20,70,600,400,.095),loglike) > optim(c(0,100,500,500,.05),loglike) > optim(c(-20,150,400,600,.035),loglike) > > # three different starting values (1 and 4 the same to check reproducablity) I get: > > Version 1.3.0 (2001-06-22) on windoze XP > > > optim(c(-20,150,400,600,.035),loglike) > $par > [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 > $value > [1] 622843.1 > $counts > function gradient > 493 NA > $convergence > [1] 0 > $message > NULL > There were 22 warnings (use warnings() to see them) > > > optim(c(20,70,600,400,.095),loglike) > $par > [1] 0.62742812 100.15891023 533.25825184 514.63882147 0.04670099 > $value > [1] 622692.7 > $couns > function gradient > 501 NA > $convergence > [1] 1 < OPPS > $message > NULL > There were 22 warnings (use warnings() to see them) > > > optim(c(0,100,500,500,.05),loglike) > $par > [1] 0.56254342 100.03881574 499.47434961 505.59785487 0.04805347 > $value > [1] 622685 > $counts > function gradient > 109 NA > $convergence > [1] 0 > $message > NULL > There were 21 warnings (use warnings() to see them) > > > optim(c(-20,150,400,600,.035),loglike) > $par > [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 > $value > [1] 622843.1 > $counts > function gradient > 493 NA > $convergence > [1] 0 > $message > NULL > There were 22 warnings (use warnings() to see them) > > > Questions: > 1) Did I mess up anything in the formulae? > 2) Any suggestions for converging to the same value? > 3) Any suggestions for other methods to get means, stddevs and proportions of the mixture of distributions? > > Thanx > > Robert (Bob) L Sandefur > Principal Geostatistician > Pincock Allen & Holt (A Hart Crowser Company) > International Minerals Consultants > 274 Union Suite 200 > Lakewood CO 80228 > USA > 303 914-4467 v > 303 987-8907 f > rls at pincock.com > > > ~ > ~ > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- 318 Carr Hall bolker at zoo.ufl.edu Zoology Department, University of Florida http://www.zoo.ufl.edu/bolker Box 118525 (ph) 352-392-5697 Gainesville, FL 32611-8525 (fax) 352-392-3704 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._