thr3ads.net - R help - [R] glm() function not finding the maximum [Apr 2002]

If this information is useful, please help other people find it:
Share via:

Richard Nixon

2002-Apr-22 15:51 UTC

[R] glm() function not finding the maximum

Hello,

I have found a problem with using the glm function with a gamma
family.

I have a vector of data, assumed to be generated by a gamma distribution.
The parameters of this gamma distribution are estimated in two ways (i)
using the glm() function, (ii) "by hand", using the optim() function.

I find that the -2*likelihood at the maximum found by (i) is substantially
larger than that found by (ii), i.e. the glm() function is not finding the
maximum.

This is some what of a pathological example, as the data set is highly
skewed and contains a couple of outliers.

I've tested this in S+ and the same problem is there too.

Is this cause for concern, or is my data set just a "nasty" one to
deal
with?

I am really impressed with the optim() function. Indeed, it is the reason
why I switched to R from Splus. The Splus analogue was very slow, and
didn't find the maximum.

The data set and code for the two methods of estimation are included
below. I don't think I am making a mistake here. Sorry if I have.

Thanks

Richard
> gamma1(data) #uses the glm() function$loglik
[1] 875.4274

$par
[1] 9.572403e-02 4.345771e+03
> gamma2(data) #"by hand" using optim()$loglik
[1] 793.3913

$par
[1]   0.518145 802.854297

#Data set
data_c(51.47, 210.19, 49.55, 61.93, 60.61, 744.57, 338.59, 133.93,
191.57, 111.43, 432.83, 185.23, 155.61, 84.72, 120.2, 15.33,
77.05, 115.77, 25.23, 657.94, 108.39, 61.08, 142.42, 87.86, 272.87,
213.78, 65.23, 102.45, 58.16, 176.58, 76.58, 434.12, 362.35,
102.53, 103.6, 25.23, 97.19, 88.52, 118.55, 151.9, 2.7, 156.41,
21.79, 272.27, 23.16, 32.07, 6325.23, 92.37, 8340.04, 51.08,
55.59, 94.08, 69.98, 554.13, 104.88, 170.15, 945.1, 143.52)

#Fits data to a gamma distribution using glm()
gamma1_function(data){
n_length(data)
m_summary(glm(data~1, family=Gamma(link=identity)))
shape_1/as.numeric(m$disp)
scale_as.numeric(m$coeff[1]*m$disp)

dev.res_-2*log(dgamma(data,shape=shape,scale=scale))
loglik_sum(dev.res)  #actually -2 * log like

list(loglik=loglik,par=c(shape,scale))
}

#Fits data to a gamma distribution "by hand" using optim()
gamma2_function(data){
n_length(data)
m_summary(glm(data~1, family=Gamma))
shape_1/as.numeric(m$disp)

#L = -Log likelihood
L_function(x){-(-n*log(gamma(x[1]))+n*x[1]*log(x[1]/x[2])+(x[1]-1)*sum(log(data))-x[1]/x[2]
*sum(data))}
start_c(shape, mean(data))
parscale_start
fit_optim(start,L,method="L-BFGS-B",lower=c(shape/100,0),
upper=c(NA,NA),control=list(parscale=parscale))
shape_fit$par[1]
mu_fit$par[2]
scale_mu/shape

dev.res_-2*log(dgamma(data,shape=shape,scale=scale))
loglik_sum(dev.res)  #actually -2 * log like

list(loglik=loglik,par=c(shape,scale))
}


--
Dr. Richard Nixon
MRC Biostatistics Unit, Institute of Public Health,
Robinson Way, Cambridge, CB2 2SR
mrc-bsu.cam.ac.uk/personal/richard
Tel: +44 (0)1223 330382, Fax: +44 (0)1223 330388

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brett Presnell

2002-Apr-22 19:46 UTC

head link

[R] glm() function not finding the maximum

In message <Pine.GSO.4.44.0204221641030.4377-100000 at moeran> you
write:> 
> #Fits data to a gamma distribution using glm()
> gamma1_function(data){
> n_length(data)
> m_summary(glm(data~1, family=Gamma(link=identity)))
> shape_1/as.numeric(m$disp)
You're not using the mle of the gamma shape parameter.  See the
function shape.gamma in library(MASS).

-- 
Brett Presnell
Department of Statistics
University of Florida
stat.ufl.edu/~presnell
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Richard Nixon

2002-Apr-24 17:38 UTC

head link

[R] glm() function not finding the maximum

Hello,

Thanks to Brett Presnell and Brian Ripley I see the error of my ways.

I now conclude one of two things, either:

(1)
model_glm(data~1, family=Gamma)
summary(model)$dispersion

is not the same thing as dispersion as defined by McCullagh and Nelder
(M+N) (Generalized linear models).

Because the shape parameter of a gamma distribution is 1/(M+N dispersion),
but
1/summary(model)$dispersion is not the shape parameter. However,

or (2)
1/summary(model)$dispersion is supposed to equal M+N dispersion but my
pathological data set messes it up.

note that the moment estimator of the shape parameter
mean(data)^2/var(data) = 0.09572403
is close to
1/summary(model)$dispersion = 0.09622557

Brian Ripley also points out that optim() is now part of the latest MASS
library for splus6.

Thanks
Richard
> library(MASS)
> gamma1(data) #uses the glm() function$loglik1
[1] 875.0035

$loglik2
[1] 793.3913       # Now agrees with log likelihood below

$par
[1]   0.09622557   0.51814501 415.99465517
# $par[2]=shape = gamma.shape(m)$alpha is correct
# $par[1]=shape = 1/summary(m)$disp is not correct
> gamma2(data) #"by hand" using optim()$loglik
[1] 793.3913

$par
[1]   0.518145 415.994662

data_c(51.47, 210.19, 49.55, 61.93, 60.61, 744.57, 338.59, 133.93,
191.57, 111.43, 432.83, 185.23, 155.61, 84.72, 120.2, 15.33,
77.05, 115.77, 25.23, 657.94, 108.39, 61.08, 142.42, 87.86, 272.87,
213.78, 65.23, 102.45, 58.16, 176.58, 76.58, 434.12, 362.35,
102.53, 103.6, 25.23, 97.19, 88.52, 118.55, 151.9, 2.7, 156.41,
21.79, 272.27, 23.16, 32.07, 6325.23, 92.37, 8340.04, 51.08,
55.59, 94.08, 69.98, 554.13, 104.88, 170.15, 945.1, 143.52)

#Fits data to a gamma distribution using glm()
gamma1_function(data){
m_glm(data~1, family=Gamma)
shape1_1/summary(m)$disp
shape2_gamma.shape(m)$alpha
mu_mean(data)

dev.res1_-2*log(dgamma(data,shape=shape1,scale=mu/shape1))
loglik1_sum(dev.res1)  #actually -2 * log like
dev.res2_-2*log(dgamma(data,shape=shape2,scale=mu/shape2))
loglik2_sum(dev.res2)  #actually -2 * log like

list(loglik1=loglik1,loglik2=loglik2,par=c(shape1,shape2,mu))
}

#Fits data to a gamma distribution "by hand" using optim()
gamma2_function(data){
n_length(data)
m_glm(data~1, family=Gamma)
shape_gamma.shape(m)$alpha

#L = -Log likelihood
L_function(x){-(-n*log(gamma(x[1]))+n*x[1]*log(x[1]/x[2])+(x[1]-1)*sum(log(data))-x[1]/x[2]
*sum(data))}
start_c(shape, mean(data))
parscale_start
fit_optim(start,L,method="L-BFGS-B",lower=c(shape/100,0),
upper=c(NA,NA),control=list(parscale=parscale))
shape_fit$par[1]
mu_fit$par[2]

dev.res_-2*log(dgamma(data,shape=shape,scale=mu/shape))
loglik_sum(dev.res)  #actually -2 * log like

list(loglik=loglik,par=c(shape,mu))
}



--
Dr. Richard Nixon
MRC Biostatistics Unit, Institute of Public Health,
Robinson Way, Cambridge, CB2 2SR
mrc-bsu.cam.ac.uk/personal/richard
Tel: +44 (0)1223 330382, Fax: +44 (0)1223 330388

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Richard Nixon

2002-May-01 11:40 UTC

head link

[R] glm() function not finding the maximum

For those of you who have been following my random ranting about the
glm(,family=gamma) function, for completeness here are my final
(hopefully!) remarks.

Thanks to Brett Presnell for explaining how the glm function estimates its
parameters.

The dispersion parameter in a gamma distribution is the same thing as
dispersion as defined by McCullagh and Nelder (Generalized linear models).
It is estimated by a moment estimator.

If one requires a maximum likelihood estimator, then one can use the
gamma.shape() function from the MASS library. shape=1/dispersion.

These two methods will generally yield similar results, but for highly
skewed data, like that given in my example, they can differ markedly.

Code showing this is included below.

Hope this is of some use to someone, and has iterated close enough to how
this function actually works.

Richard.

> gamma(data)
$loglik.mom
[1] 875.0035 #-2log likelihood using a moment estimator for shape

$loglik.mle
[1] 793.3913 #-2log likelihood using a mle estimator for shape

$par
[1]   0.09622557   0.51814501 415.99465517
#par[1] = moment estmator for shape
#par[2] = mle estimator for shape

data_c(51.47, 210.19, 49.55, 61.93, 60.61, 744.57, 338.59, 133.93,
191.57, 111.43, 432.83, 185.23, 155.61, 84.72, 120.2, 15.33,
77.05, 115.77, 25.23, 657.94, 108.39, 61.08, 142.42, 87.86, 272.87,
213.78, 65.23, 102.45, 58.16, 176.58, 76.58, 434.12, 362.35,
102.53, 103.6, 25.23, 97.19, 88.52, 118.55, 151.9, 2.7, 156.41,
21.79, 272.27, 23.16, 32.07, 6325.23, 92.37, 8340.04, 51.08,
55.59, 94.08, 69.98, 554.13, 104.88, 170.15, 945.1, 143.52)

#Fits data to a gamma distribution using glm()
gamma_function(data){
require(MASS, quietly=T)
model_glm(data~1, family=Gamma)
shape.mom_1/summary(model)$dispersion #this is a moment estimator
shape.mle_gamma.shape(m)$alpha        #this is a mle estimator
mu_mean(data)

dev.res.mom_-2*log(dgamma(data,shape=shape.mom,scale=mu/shape.mom))
loglik.mom_sum(dev.res.mom)  #actually -2 * log like
dev.res.mle_-2*log(dgamma(data,shape=shape.mle,scale=mu/shape.mle))
loglik.mle_sum(dev.res.mle)  #actually -2 * log like

list(loglik.mom=loglik.mom,loglik.mle=loglik.mle,par=c(shape.mom,shape.mle,mu))
}

--
Dr. Richard Nixon
MRC Biostatistics Unit, Institute of Public Health,
Robinson Way, Cambridge, CB2 2SR
mrc-bsu.cam.ac.uk/personal/richard
Tel: +44 (0)1223 330382, Fax: +44 (0)1223 330388

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more maybe matching threads

R help - Apr 2002 - glm() function not finding the maximum

[R] glm() function not finding the maximum

[R] glm() function not finding the maximum

[R] glm() function not finding the maximum

[R] glm() function not finding the maximum

Maybe Matching Threads