Bob Sandefur
2001-Aug-28 22:10 UTC
[R] fitting a mixture of distributions with optim and max log likelihood ? Clarifications
Hi- Professor Ripley raised some questions on my first email: Are you on XP? I am on Release candidate 1 from MSDN professional subscriptions, actually. Anyone (although perhaps only in US) can get about the same thing from the MicroSoft Preview program for about $20 US. (This release is time limited). cygwin works for me so far but I only use the shell, shell scripts, awk and sed on a regular basis (gcc and g77 are okay as far as I have used them) and the cygwin installer happly updated my windows XP verison What were warnings? lWarning messages: 1: NaNs produced in: log(x) ... 22: NaNs produced in: log(x) I really want is to fit mixtures of 2 or 3 lognormal distributions that are discrete to the nearest 0.001 (gold assay values in ounces/ton) with both a normal error of about + or - 0.003 and a proportional error of about + or - 5% constrained to be non negative so I think analytical derivatives are out the window. Could you point me to a reference on log-sds? thanx bob Thread follows:>>> Prof Brian D Ripley 08/28/01 03:10PM >>>On Tue, 28 Aug 2001, Bob Sandefur wrote:> hi > Suppose I have a mixture of 2 distributions generated byThis is well-known difficult optimization problem with many local maxima. I think in particular that your test example has too broad a second normal for this to work well. The default method for optim() is slow, not very precise but very reliable. I very rarely use it. With a problem like this where derivatives are readily available I would BFGS with analytical derivatives and multiple starting points to check for local maxima. BTw, what are those warnings? You actually have a constrained space for the optimization, and I would have worked with log-sds to avoid that.> > rtwonormals <- function(npnt,m1,s1,m2,s2,p2){ > rv<-vector(npnt,mode="numeric") > for( i in seq(1:npnt)){ > if(runif(1,0,1)<=p2){ > rv[i]<-rnorm(1,m2,s2) > } > else{ > rv[i]<-rnorm(1,m1,s1) > } > } > return(rv) > } > x <- rtwonormals(50000,0,100,500,500,0.05) > > #and I try to fit these with (based on thread: [R] EstimatingWeibull Distribution Parameters - very basic question)> > loglike<-function(p)-2*sum(log((1-p[5])*dnorm(x,p[1],p[2])+p[5]*dnorm(x,p[3],p[4])))> optim(c(-20,150,400,600,.035),loglike) > optim(c(20,70,600,400,.095),loglike) > optim(c(0,100,500,500,.05),loglike) > optim(c(-20,150,400,600,.035),loglike) > > # three different starting values (1 and 4 the same to checkreproducablity) I get:> > Version 1.3.0 (2001-06-22) on windoze XPInteresting. I thought XP was to be released on Oct 25, but there were rumours that MicroSoft had changed this. Do you really have XP, or a preview? If this really does run under the real XP, we should add that to the readme etc. Quite a few things, including our Cygwin tools, do not run under XP, allegedly.> > optim(c(-20,150,400,600,.035),loglike) > $par > [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 > $value > [1] 622843.1 > $counts > function gradient > 493 NA > $convergence > [1] 0 > $message > NULL > There were 22 warnings (use warnings() to see them) > > > optim(c(20,70,600,400,.095),loglike) > $par > [1] 0.62742812 100.15891023 533.25825184 514.63882147 0.04670099 > $value > [1] 622692.7 > $couns > function gradient > 501 NA > $convergence > [1] 1 < OPPS > $message > NULL > There were 22 warnings (use warnings() to see them) > > > optim(c(0,100,500,500,.05),loglike) > $par > [1] 0.56254342 100.03881574 499.47434961 505.59785487 0.04805347 > $value > [1] 622685 > $counts > function gradient > 109 NA > $convergence > [1] 0 > $message > NULL > There were 21 warnings (use warnings() to see them) > > > optim(c(-20,150,400,600,.035),loglike) > $par > [1] 1.28597210 100.53443070 550.06070497 615.06936388 0.04563778 > $value > [1] 622843.1 > $counts > function gradient > 493 NA > $convergence > [1] 0 > $message > NULL > There were 22 warnings (use warnings() to see them) > > > Questions: > 1) Did I mess up anything in the formulae? > 2) Any suggestions for converging to the same value? > 3) Any suggestions for other methods to get means, stddevs andproportions of the mixture of distributions?> > Thanx > > Robert (Bob) L Sandefur > Principal Geostatistician > Pincock Allen & Holt (A Hart Crowser Company) > International Minerals Consultants > 274 Union Suite 200 > Lakewood CO 80228 > USA > 303 914-4467 v > 303 987-8907 f > rls at pincock.com > > > ~ > ~ > >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.-.-.-> r-help mailing list -- Readhttp://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To:r-help-request at stat.math.ethz.ch>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._._._>-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -------------- next part -------------- An HTML attachment was scrubbed... URL: https://stat.ethz.ch/pipermail/r-help/attachments/20010828/916998c7/attachment.html