Dear R-help readers,
I am writing to you in order to ask you a few questions about distribution
fitting in R.
I am trying to find out whether the set of event interarrival times that I
am currently analyzing is distributed with a Gamma or General Pareto
distribution. The event arrival granularity is in minutes and interarrival
times are in seconds, so the values I have are 0, 60, 120, 180, and so on...
Here is what I did. Since several values of the interarrival_times array are
zero, to avoid numerical errors when calculating their logarithm, I set them
to 1E-10:
> # fix numerical values
> for (i in 1:length(interarrival_times)) {
+ if (interarrival_times[i] == 0)
+ interarrival_times[i] = 0.0000000001
+ }
Then I defined the negative log likelihood for the Gamma distribution:
> nll_gamma_log <- function(lambda_log, alfa_log) {
+ n <- length(interarrival_times)
+ x <- interarrival_times
+ -n*exp(alfa_log)*lambda_log + n*log(gamma(exp(alfa_log))) -
(exp(alfa_log)-1)*sum(log(x)) + exp(lambda_log)*sum(x)
+ }
and passed it to the mle function:
> est_gamma <- mle(minuslog = nll_gamma_log, start = list(lambda_log = 0,
alfa_log=0))
There were 50 or more warnings (use warnings() to see the first 50)
of course I got many "value out of range" warnings:
> warnings()
Warning messages:
1: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn'
[...]
7: In gamma(exp(log_alfa)) ... : NaNs produced
[...]
50: In log(gamma(exp(log_alfa))) ... : value out of range in 'gammafn'
but that was expected, right?
The real problems come out when I try to fit a General Pareto distribution.
Using the same method, I calculated the negative log likelihood function for
the General Pareto distribution:
> # negative log-likelihood function for general pareto distribution
> nll_gpd <- function(xi, log_sigma, mu) { # shape, scale, location
+ n <- length(interarrival_times)
+ x <- interarrival_times
+ cat("xi:",xi,"; log_sigma:",log_sigma,";
mu:",mu,"\n")
+ n*log_sigma + (xi+1)*sum(log(1+xi*(x-mu)/exp(log_sigma)))/xi
+ }
and passed it to the mle function. However, this time I get some errors:
> est_gpd <- mle(minuslog = nll_gpd, start = list(xi = 1, log_sigma = 1,
mu
= 1))
xi: 1 ; log_sigma: 1 ; mu: 1
xi: 1.001 ; log_sigma: 1 ; mu: 1
xi: 0.999 ; log_sigma: 1 ; mu: 1
xi: 1 ; log_sigma: 1.001 ; mu: 1
xi: 1 ; log_sigma: 0.999 ; mu: 1
xi: 1 ; log_sigma: 1 ; mu: 1.001
xi: 1 ; log_sigma: 1 ; mu: 0.999
xi: 52007.84 ; log_sigma: 13414.95 ; mu: 4241.565
xi: 10402.37 ; log_sigma: 2683.789 ; mu: 849.113
xi: 18.08721 ; log_sigma: 3.862682 ; mu: 2.627865
xi: 18.08721 ; log_sigma: 3.860682 ; mu: 2.627865
Error in optim(start, f, method = method, hessian = TRUE, ...) :
non-finite finite-difference value [2]
In addition: There were 50 or more warnings (use warnings() to see the first
50)
I also get many warnings:
> warnings()
Warning messages:
1: In base::log(x, base) ... : NaNs produced
2: In base::log(x, base) ... : NaNs produced
[...]
50: In base::log(x, base) ... : NaNs produced
I really don't know how to fix this problem. Do you have any suggestions?
Thank you very much in advance.
Best regards,
Mauro Tortonesi
--
Mauro Tortonesi, Ph.D.
Research Assistant
Distributed Systems Research Group
Engineering Department
University of Ferrara
[[alternative HTML version deleted]]