Dear R Users, I am trying to obtain a best-fit analytic distribution for a dataset with 11535459 entries. The data range in value from 1 to 300000000. I use: fitdistr(data, "gamma") to obtain mle's for the parameters. I get the following error: Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) : non-finite finite-difference value [1] And the following warnings: NaNs produced in: dgamma(x, shape, scale, log) I have the same problem with the exponential distribution, but the lognormal and weibull distributions don't have this problem. I suspect the problem has to do with: "It means that in computing derivatives by finite differencing, one of the values is NA, +Inf or -Inf. Put some print values in your objective function and find out why it is giving a non-finite value." - quote by Prof Ripley from r-help as from the warnings it is clear that some of the values obtained during maximum likelihood computations are NaN. I cannot however print values of the objective function, as in my case it is dgamma which is called by fitdistr, over which I don't have control. Can anyone please point me in the right direction? Lourens -- Lourens Olivier Walters <lwalters at cs.uct.ac.za> Computer Science Masters Student University of Cape Town South Africa people.cs.uct.ac.za/~lwalters
In my experience, the most likely cause of this problem is that optim may try to test nonpositive values for shape or scale. I avoid this situation by programming the log(likelihood) in terms of log(shape) and log(scale) as follows: > gammaLoglik <- + function(x, logShape, logScale, negative=TRUE){ + lglk <- sum(dgamma(x, shape=exp(logShape), scale=exp(logScale), + log=TRUE)) + if(negative) return(-lglk) else return(lglk) + } > tst <- rgamma(10, 1) > gammaLoglik(tst, 0, 0) [1] 12.29849 Then I then call optim directly to minimize the negative of the log(likelihood). If I've guessed correctly, this should fix the problem. If not, let us know. hope this helps. spencer graves Lourens Olivier Walters wrote:>Dear R Users, > >I am trying to obtain a best-fit analytic distribution for a dataset >with 11535459 entries. The data range in value from 1 to 300000000. I >use: fitdistr(data, "gamma") to obtain mle's for the parameters. > >I get the following error: > >Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) : > non-finite finite-difference value [1] > >And the following warnings: > >NaNs produced in: dgamma(x, shape, scale, log) > >I have the same problem with the exponential distribution, but the >lognormal and weibull distributions don't have this problem. > >I suspect the problem has to do with: > >"It means that in computing derivatives by finite differencing, one >of the values is NA, +Inf or -Inf. Put some print values in >your objective function and find out why it is giving a non-finite >value." - quote by Prof Ripley from r-help > >as from the warnings it is clear that some of the values obtained during >maximum likelihood computations are NaN. I cannot however print values >of the objective function, as in my case it is dgamma which is called by >fitdistr, over which I don't have control. > >Can anyone please point me in the right direction? > >Lourens > > >
Spencer Graves's suggestion of using shape and scale parameters on a log scale is a good one. To do specifically what you want (check values for which the objective function is called and see what happens) you can do the following (untested!), which makes a local copy of dgamma that you can mess with: dgamma.old <- dgamma dgamma <- function(x,shape,rate,...) { d <- dgamma.old(x,shape,rate,...) cat(shape,rate,d,"\n") return(d) } On Tue, 30 Sep 2003, Lourens Olivier Walters wrote:> Dear R Users, > > I am trying to obtain a best-fit analytic distribution for a dataset > with 11535459 entries. The data range in value from 1 to 300000000. I > use: fitdistr(data, "gamma") to obtain mle's for the parameters. > > I get the following error: > > Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) : > non-finite finite-difference value [1] > > And the following warnings: > > NaNs produced in: dgamma(x, shape, scale, log) > > I have the same problem with the exponential distribution, but the > lognormal and weibull distributions don't have this problem. > > I suspect the problem has to do with: > > "It means that in computing derivatives by finite differencing, one > of the values is NA, +Inf or -Inf. Put some print values in > your objective function and find out why it is giving a non-finite > value." - quote by Prof Ripley from r-help > > as from the warnings it is clear that some of the values obtained during > maximum likelihood computations are NaN. I cannot however print values > of the objective function, as in my case it is dgamma which is called by > fitdistr, over which I don't have control. > > Can anyone please point me in the right direction? > > Lourens > >-- 620B Bartram Hall bolker at zoo.ufl.edu Zoology Department, University of Florida zoo.ufl.edu/bolker Box 118525 (ph) 352-392-5697 Gainesville, FL 32611-8525 (fax) 352-392-3704
Thanks, again I managed to get convergence, although looking at the resultant Pareto distribution plotted in R, it doesn't seem that the log likelihood estimated parameters are more accurate than the methods of moments estimator parameters - I will have to look into it. For the record, here is the code I used: ########### data.sorted <- sort(data) alpha.val <- data.sorted[1] beta.val <- 1/((1/n) * sum(log(data.sorted/alpha.val))) log.alpha.val <- log(alpha.val) # Optimize log.del = log(0.0000001), which is the log(difference # between alpha and log likelihood estimated alpha), chose small # difference to start of with log.del = -16.11809565 paretoLoglik <- function(params, negative=TRUE){ alpha <- data.sorted[1]+exp(params[1]) lglk <- sum(pareto.density(data.sorted, alpha=exp(alpha), + beta=exp(params[2]), log=TRUE)) if(negative) return(-lglk) else return(lglk) } optim.list <- optim(c(log.del, log.beta.val), paretoLoglik, + method="BFGS") pareto.param1 <- exp(optim.list$par[1]) + alpha.val pareto.param2 <- exp(optim.list$par[2]) ########## On Wed, 2003-10-08 at 15:07, Spencer Graves wrote:> I'm sorry, but I don't have time to read all your code.However,> I saw that you tested for x > alpha in your Pareto distribution > example. Have you considered reparameterizing to estimate log.del = > log(alpha-min(x))? Pass log.del as part of the vector of parametersto> estimate, then immediately inside the function, compute > > alpha <- (min(x)+exp(log.del)) > > I've fixed many convergence problems with simple tricks likethis.> > hope this helps. spencer graves