thr3ads.net - R help - [R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros' [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Fox, Aaron

2008-Jun-11 18:22 UTC

[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

Greetings, all

I am having difficulty getting the fitdistr() function to return without
an error on my data. Specifically, what I'm trying to do is get a
parameter estimation for fracture intensity data in a well / borehole.
Lower bound is 0 (no fractures in the selected data interval), and upper
bound is ~ 10 - 50, depending on what scale you are conducting the
analysis on.

I read in the data from a text file, convert it to numerics, and then
calculate initial estimates of the shape and scale parameters for the
gamma distribution from moments. I then feed this back into the
fitdistr() function.

R code (to this point):
########################################
data.raw=c(readLines("FSM_C_9m_ENE.inp"))
data.num <- as.numeric(data.raw)
data.num
library(MASS)
shape.mom = ((mean(data.num))/ (sd(data.num))^2
shape.mom
med.data = mean(data.num)
sd.data = sd(data.num)
med.data
sd.data
shape.mom = (med.data/sd.data)^2
shape.mom
scale.mom = (sd.data^2)/med.data
scale.mom
fitdistr(data.num,"gamma",list(shape=shape.mom,
scale=scale.mom),lower=0)

fitdistr() returns the following error:

" Error in optim(x = c(0.402707037, 0.403483333, 0.404383704,
2.432626667,  : 
  L-BFGS-B needs finite values of 'fn'"

Next thing I tried was to manually specify the negative log-likelihood
function and pass it straight to mle() (the method specified in Ricci's
tutorial on fitting distributions with R).  Basically, I got the same
result as using fitdistr().

Finally I tried using some R code I found from someone with a similar
problem back in 2003 from the archives of this mailing list:

R code
########################################
gamma.param1 <- shape.mom
gamma.param2 <- scale.mom
log.gamma.param1 <- log(gamma.param1)
log.gamma.param2 <- log(gamma.param2)


                                                       gammaLoglik <-
function(params, 
negative=TRUE){
   lglk <- sum(dgamma(data, shape=exp(params[1]), scale=exp(params[2]),
log=TRUE))
   if(negative)
      return(-lglk)
   else
      return(lglk)
}

optim.list <- optim(c(log.gamma.param1, log.gamma.param2), gammaLoglik)
gamma.param1 <- exp(optim.list$par[1])
gamma.param2 <- exp(optim.list$par[2])
#########################################

If I test this function using my sample data and the estimates of shape
and scale derived from the method of moments, gammaLogLike returns as
INF. I suspect the problem is that the zeros in the data are causing the
optim solver problems when it attempts to minimize the negative
log-likelihood function.

Can anyone suggest some advice on a work-around?  I have seen
suggestions online that a 'censoring' algorithm can allow one to use MLE
methods to estimate the gamma distribution for data with zero values
(Wilkes, 1990, Journal of Climate). I have not, however, found R code to
implement this, and, frankly, am not smart enough to do it myself... :-)

Any suggestions? Has anyone else run up against this and written code to
solve the problem?

Thanks in advance!

Aaron Fox
Senior Project Geologist, Golder Associates
+1 425 882 5484 || +1 425 736 3958 (mobile)
afox at golder.com || fracturedreservoirs.com

Peter Dalgaard

2008-Jun-11 20:40 UTC

head link

[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

Fox, Aaron wrote:> Greetings, all
>
> I am having difficulty getting the fitdistr() function to return without
> an error on my data. Specifically, what I'm trying to do is get a
> parameter estimation for fracture intensity data in a well / borehole.
> Lower bound is 0 (no fractures in the selected data interval), and upper
> bound is ~ 10 - 50, depending on what scale you are conducting the
> analysis on.
>
> I read in the data from a text file, convert it to numerics, and then
> calculate initial estimates of the shape and scale parameters for the
> gamma distribution from moments. I then feed this back into the
> fitdistr() function.
>
> R code (to this point):
> ########################################
> data.raw=c(readLines("FSM_C_9m_ENE.inp"))
> data.num <- as.numeric(data.raw)
> data.num
> library(MASS)
> shape.mom = ((mean(data.num))/ (sd(data.num))^2
> shape.mom
> med.data = mean(data.num)
> sd.data = sd(data.num)
> med.data
> sd.data
> shape.mom = (med.data/sd.data)^2
> shape.mom
> scale.mom = (sd.data^2)/med.data
> scale.mom
> fitdistr(data.num,"gamma",list(shape=shape.mom,
> scale=scale.mom),lower=0)
>
> fitdistr() returns the following error:
>
> " Error in optim(x = c(0.402707037, 0.403483333, 0.404383704,
> 2.432626667,  : 
>   L-BFGS-B needs finite values of 'fn'"
>
> Next thing I tried was to manually specify the negative log-likelihood
> function and pass it straight to mle() (the method specified in Ricci's
> tutorial on fitting distributions with R).  Basically, I got the same
> result as using fitdistr().
>
> Finally I tried using some R code I found from someone with a similar
> problem back in 2003 from the archives of this mailing list:
>
> R code
> ########################################
> gamma.param1 <- shape.mom
> gamma.param2 <- scale.mom
> log.gamma.param1 <- log(gamma.param1)
> log.gamma.param2 <- log(gamma.param2)
>
>
>                                                        gammaLoglik <-
> function(params, 
> negative=TRUE){
>    lglk <- sum(dgamma(data, shape=exp(params[1]), scale=exp(params[2]),
> log=TRUE))
>    if(negative)
>       return(-lglk)
>    else
>       return(lglk)
> }
>
> optim.list <- optim(c(log.gamma.param1, log.gamma.param2), gammaLoglik)
> gamma.param1 <- exp(optim.list$par[1])
> gamma.param2 <- exp(optim.list$par[2])
> #########################################
>
> If I test this function using my sample data and the estimates of shape
> and scale derived from the method of moments, gammaLogLike returns as
> INF. I suspect the problem is that the zeros in the data are causing the
> optim solver problems when it attempts to minimize the negative
> log-likelihood function.
>
> Can anyone suggest some advice on a work-around?  I have seen
> suggestions online that a 'censoring' algorithm can allow one to
use MLE
> methods to estimate the gamma distribution for data with zero values
> (Wilkes, 1990, Journal of Climate). I have not, however, found R code to
> implement this, and, frankly, am not smart enough to do it myself... :-)
>
> Any suggestions? Has anyone else run up against this and written code to
> solve the problem?
>   It's fairly easy. You decide that the zeros really represent values less 
than "delta" (e.g. 0.5 if your data are integers), then replace 
dgamma(0,....) with pgamma(delta,...) in the likelihood. (And, BTW, the 
problem is not that the optimizers get in trouble, but rather that the 
log-likelihood *is* +/- Inf if there are zeros in data unless the shape 
parameter is exactly 1 -- the x^(a-1) factor in the gamma density causes 
this).


-- 
   O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Ben Bolker

2008-Jun-11 20:58 UTC

head link

[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

Fox, Aaron <Afox <at> golder.com> writes:
> 
> Greetings, all
> 
> I am having difficulty getting the fitdistr() function to return without
> an error on my data. Specifically, what I'm trying to do is get a
> parameter estimation for fracture intensity data in a well / borehole.
> Lower bound is 0 (no fractures in the selected data interval), and upper
> bound is ~ 10 - 50, depending on what scale you are conducting the
> analysis on.
> 
  You're right that the basic problem is with the gamma
distribution.  P(x,shape) dx = 0 (shape>1), 1 (shape=1), or Inf (shape<1).
A quick cheat would be to add a small number (0.001?) to your data,
try it again, and see how sensitive the estimate is to how small
that number is.  You could also try a negative binomial fit, which
is the discrete analog of the gamma (and hence won't have any
problem with zeros).  People who do beta regressions with
zero values in them often talk about adding a small Bayesian
'fudge factor' to deal with this problem ... (see 
psychology.anu.edu.au/people/smithson/details/betareg/Readme.pdf )

  Ben Bolker

Maybe Matching Threads

Search for more apparently analagous threads

R help - Jun 2008 - MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

[R] MLE Estimation of Gamma Distribution Parameters for data with 'zeros'

Maybe Matching Threads