Dear R Users, 
I am trying to obtain a best-fit analytic distribution for a dataset
with 11535459 entries. The data range in value from 1 to 300000000. I
use: fitdistr(data, "gamma") to obtain mle's for the parameters.
 
I get the following error:
Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) :
        non-finite finite-difference value [1]
And the following warnings:
NaNs produced in: dgamma(x, shape, scale, log)
I have the same problem with the exponential distribution, but the
lognormal and weibull distributions don't have this problem. 
I suspect the problem has to do with:
"It means that in computing derivatives by finite differencing, one 
of the values is NA, +Inf or -Inf. Put some print values in 
your objective function and find out why it is giving a non-finite
value." - quote by Prof Ripley from r-help
as from the warnings it is clear that some of the values obtained during
maximum likelihood computations are NaN. I cannot however print values
of the objective function, as in my case it is dgamma which is called by
fitdistr, over which I don't have control.  
Can anyone please point me in the right direction?
Lourens
-- 
Lourens Olivier Walters <lwalters at cs.uct.ac.za>
Computer Science Masters Student
University of Cape Town
South Africa
http://people.cs.uct.ac.za/~lwalters/
In my experience, the most likely cause of this problem is that 
optim may try to test nonpositive values for shape or scale.  I avoid 
this situation by programming the log(likelihood) in terms of log(shape) 
and log(scale) as follows: 
 > gammaLoglik <-
+ function(x, logShape, logScale, negative=TRUE){
+ lglk <- sum(dgamma(x, shape=exp(logShape), scale=exp(logScale),
+ log=TRUE))
+ if(negative) return(-lglk) else return(lglk)
+ }
 > tst <- rgamma(10, 1)
 > gammaLoglik(tst, 0, 0)
[1] 12.29849
Then I then call optim directly to minimize the negative of the 
log(likelihood). 
      If I've guessed correctly, this should fix the problem.  If not, 
let us know. 
      hope this helps.  spencer graves
Lourens Olivier Walters wrote:
>Dear R Users, 
>
>I am trying to obtain a best-fit analytic distribution for a dataset
>with 11535459 entries. The data range in value from 1 to 300000000. I
>use: fitdistr(data, "gamma") to obtain mle's for the
parameters.
> 
>I get the following error:
>
>Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) :
>        non-finite finite-difference value [1]
>
>And the following warnings:
>
>NaNs produced in: dgamma(x, shape, scale, log)
>
>I have the same problem with the exponential distribution, but the
>lognormal and weibull distributions don't have this problem. 
>
>I suspect the problem has to do with:
>
>"It means that in computing derivatives by finite differencing, one 
>of the values is NA, +Inf or -Inf. Put some print values in 
>your objective function and find out why it is giving a non-finite
>value." - quote by Prof Ripley from r-help
>
>as from the warnings it is clear that some of the values obtained during
>maximum likelihood computations are NaN. I cannot however print values
>of the objective function, as in my case it is dgamma which is called by
>fitdistr, over which I don't have control.  
>
>Can anyone please point me in the right direction?
>
>Lourens
>
>  
>
Spencer Graves's suggestion of using shape and scale parameters on a log 
scale is a good one.
  To do specifically what you want (check values for which the objective 
function is called and see what happens) you can do the following 
(untested!), which makes a local copy of dgamma that you can mess with:
dgamma.old <- dgamma
dgamma <- function(x,shape,rate,...) {
   d <- dgamma.old(x,shape,rate,...)
   cat(shape,rate,d,"\n")
   return(d)
}
On Tue, 30 Sep 2003, Lourens Olivier Walters wrote:
> Dear R Users, 
> 
> I am trying to obtain a best-fit analytic distribution for a dataset
> with 11535459 entries. The data range in value from 1 to 300000000. I
> use: fitdistr(data, "gamma") to obtain mle's for the
parameters.
>  
> I get the following error:
> 
> Error in optim(start, mylogfn, x = x, hessian = TRUE, ...) :
>         non-finite finite-difference value [1]
> 
> And the following warnings:
> 
> NaNs produced in: dgamma(x, shape, scale, log)
> 
> I have the same problem with the exponential distribution, but the
> lognormal and weibull distributions don't have this problem. 
> 
> I suspect the problem has to do with:
> 
> "It means that in computing derivatives by finite differencing, one 
> of the values is NA, +Inf or -Inf. Put some print values in 
> your objective function and find out why it is giving a non-finite
> value." - quote by Prof Ripley from r-help
> 
> as from the warnings it is clear that some of the values obtained during
> maximum likelihood computations are NaN. I cannot however print values
> of the objective function, as in my case it is dgamma which is called by
> fitdistr, over which I don't have control.  
> 
> Can anyone please point me in the right direction?
> 
> Lourens
> 
> 
-- 
620B Bartram Hall                            bolker at zoo.ufl.edu
Zoology Department, University of Florida    http://www.zoo.ufl.edu/bolker
Box 118525                                   (ph)  352-392-5697
Gainesville, FL 32611-8525                   (fax) 352-392-3704
Thanks, again I managed to get convergence, although looking at the
resultant Pareto distribution plotted in R, it doesn't seem that the log
likelihood estimated parameters are more accurate than the methods of
moments estimator parameters - I will have to look into it. For the
record, here is the code I used: 
###########
data.sorted <- sort(data)
alpha.val <- data.sorted[1]
beta.val <- 1/((1/n) * sum(log(data.sorted/alpha.val)))
log.alpha.val <- log(alpha.val)
# Optimize log.del = log(0.0000001), which is the log(difference 
# between alpha and log likelihood estimated alpha), chose small 
# difference to start of with
log.del = -16.11809565
paretoLoglik <- function(params, negative=TRUE){
  alpha <- data.sorted[1]+exp(params[1])
  lglk <- sum(pareto.density(data.sorted, alpha=exp(alpha),
+ beta=exp(params[2]), log=TRUE))
  if(negative)
     return(-lglk)
  else
     return(lglk)
}
optim.list <- optim(c(log.del, log.beta.val), paretoLoglik,
+ method="BFGS")
pareto.param1 <- exp(optim.list$par[1]) + alpha.val
pareto.param2 <- exp(optim.list$par[2])
##########
On Wed, 2003-10-08 at 15:07, Spencer Graves wrote:>       I'm sorry, but I don't have time to read all your code. 
However, > I saw that you tested for x > alpha in your Pareto distribution 
> example.  Have you considered reparameterizing to estimate log.del = 
> log(alpha-min(x))?  Pass log.del as part of the vector of parameters
to > estimate, then immediately inside the function, compute
> 
>       alpha <- (min(x)+exp(log.del))
> 
>        I've fixed many convergence problems with simple tricks like
this. > 
>       hope this helps.  spencer graves