thr3ads.net - R help - [R] glm questions [Mar 2004]

If this information is useful, please help other people find it:
Share via:

Paul Johnson

2004-Mar-16 06:00 UTC

[R] glm questions

Greetings, everybody. Can I ask some glm questions?

1. How do you find out -2*lnL(saturated model)?

In the output from glm, I find:

 Null deviance:  which I think is  -2[lnL(null) - lnL(saturated)]
 Residual deviance:   -2[lnL(fitted) - lnL(saturated)]

The Null model is the one that includes the constant only (plus offset 
if specified). Right?

I can use the Null and Residual deviance to calculate the "usual model 
Chi-squared" statistic
-2[lnL(null) - lnL(fitted)].

But, just for curiosity's sake, what't the saturated model's -2lnL ?

2. Why no 'scaled deviance' in output?  Or, how are you supposed to tell
if there is over-dispersion?
I just checked andSAS gives us the scaled and nonscaled deviance. 

I have read the Venables & Ripley (MASS 4ed) chapter on GLM . I believe 
I understand the cautionary point about overdispersion toward the end 
(p. 408).  Since I'm comparing lots of other books at the moment, I 
believe I see people using the practice that is being criticized.  The 
Pearson Chi-square based estimate of dispersion is recommended and one 
uses an F test to decide if the fitted model is significantly worse than 
the saturated model.  But don't we still assess over-dispersion by 
looking at the scaled deviance (after it is calculated properly)?

Can I make a guess why glm does not report scaled deviance?  Are the glm 
authors trying to discourage us from making the lazy assessment in which 
one concludes over-dispersion is present if the scaled deviance exceeds 
the degrees of freedom?

3. When I run "example(glm)" at the end there's a Gamma model and
the
printout says:

(Dispersion parameter for Gamma family taken to be 0.001813340) 

I don't find an estimate for the Gamma distribution's shape paremeter in
the output.  I'm uncertain what the reported dispersion parameter refers 
to.  Its the denominator (phi) in the exponential family formula, isn't 
it? 

            y*theta - c(theta)                         
   exp [   ---------------------    - h(y,phi)     ]
                    phi


4. For GLM teaching purposes, can anybody point me at some good examples 
of GLM that do not use Normal, Poisson, Negative Binomial, and/or 
Logistic Regression?  I want to justify the effort to understand the GLM 
as a framework.  We have already in previous semesters followed the 
usual "econometric" approach in which OLS, Poisson/Count, and Logistic
regression are treated as special cases.  Some of the students don't see 
any benefit from tackling the GLM's new notation/terminology.

What I'm lacking is some persuasive evidence that the effort to master 
the details of the GLM is worthwhile.  I could really use some data and 
reference articles that have applications of Gamma distributed (or 
exponential) variables, say, or Weibull, or whatever.

I've been dropping my course notes in this directory:

http://lark.cc.ku.edu/~pauljohn/ps909/AdvancedRegression. 

The documents GLM1 and GLM2 are pretty good theoretical surveys <patting 
self on back/>.  But  I need to work harder to justify the effort by 
providing examples. 

I'd appreciate any feedback, if you have any. And, of course, if you 
want to take these documents and use them for your own purposes, be my 
guest.

4. Is it possible to find all methods that an object inherits?

I found out by reading the source code for J Fox's car package that
model.matrix() returns the X matrix of coded input variables, so one can do fun
things like calculate robust standard errors and such. That's really useful,
because before I found that, I was recoding up a storm to re-create the X matrix
used in a model.

Is there a direct way to find a list of all the other methods that would apply
to an object?

-- 
Paul E. Johnson                       email: pauljohn at ku.edu
Dept. of Political Science            http://lark.cc.ku.edu/~pauljohn
1541 Lilac Lane, Rm 504                              
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66044-3177           FAX: (785) 864-5700s

David Firth

2004-Mar-16 12:11 UTC

head link

[R] glm questions

Dear Paul

Here are some attempts at your questions.  I hope it's of some help.

On Tuesday, Mar 16, 2004, at 06:00 Europe/London, Paul Johnson wrote:
> Greetings, everybody. Can I ask some glm questions?
>
> 1. How do you find out -2*lnL(saturated model)?
>
> In the output from glm, I find:
>
> Null deviance:  which I think is  -2[lnL(null) - lnL(saturated)]
> Residual deviance:   -2[lnL(fitted) - lnL(saturated)]
>
> The Null model is the one that includes the constant only (plus offset 
> if specified). Right?
>
> I can use the Null and Residual deviance to calculate the "usual model
> Chi-squared" statistic
> -2[lnL(null) - lnL(fitted)].
>
> But, just for curiosity's sake, what't the saturated model's
-2lnL ?
It's important to remember that lnL is defined only up to an additive 
constant.  For example a Poisson model has lnL contributions -mu + 
y*log(mu) + constant, and the constant is arbitrary.  The differencing 
in the deviance calculation eliminates it.  What constant would you 
like to use??
>
> 2. Why no 'scaled deviance' in output?  Or, how are you supposed to
> tell if there is over-dispersion?
> I just checked andSAS gives us the scaled and nonscaled deviance.
> I have read the Venables & Ripley (MASS 4ed) chapter on GLM . I 
> believe I understand the cautionary point about overdispersion toward 
> the end (p. 408).  Since I'm comparing lots of other books at the 
> moment, I believe I see people using the practice that is being 
> criticized.  The Pearson Chi-square based estimate of dispersion is 
> recommended and one uses an F test to decide if the fitted model is 
> significantly worse than the saturated model.  But don't we still 
> assess over-dispersion by looking at the scaled deviance (after it is 
> calculated properly)?
>
> Can I make a guess why glm does not report scaled deviance?  Are the 
> glm authors trying to discourage us from making the lazy assessment in 
> which one concludes over-dispersion is present if the scaled deviance 
> exceeds the degrees of freedom?
I am unclear what you are asking here.  I assume by "scaled deviance" 
you mean deviance divided by phi, a (known) scale parameter?  (I'm 
sorry, I don't know SAS's definition.)    In many applications (eg 
binomial, Poisson) deviance and scaled deviance are the same thing, 
since phi is 1.  Yes, if you wanted to judge overdispersion relative to 
some other value of phi you would scale the deviance.  What other value 
of phi would you like?
>
> 3. When I run "example(glm)" at the end there's a Gamma model
and the
> printout says:
>
> (Dispersion parameter for Gamma family taken to be 0.001813340)
> I don't find an estimate for the Gamma distribution's shape
paremeter
> in the output.  I'm uncertain what the reported dispersion parameter 
> refers to.  Its the denominator (phi) in the exponential family 
> formula, isn't it?
>            y*theta - c(theta)                           exp [   
> ---------------------    - h(y,phi)     ]
>                    phi
>
Phi is the coefficient of variation, ie variance/(mean^2).  Thus it is 
a shape parameter.  If you are used to some other parameterization of 
the gamma family, just express the mean and variance in that 
parameterization to see the relation between your parameters and phi.
>
> 4. For GLM teaching purposes, can anybody point me at some good 
> examples of GLM that do not use Normal, Poisson, Negative Binomial, 
> and/or Logistic Regression?  I want to justify the effort to 
> understand the GLM as a framework.  We have already in previous 
> semesters followed the usual "econometric" approach in which OLS,
> Poisson/Count, and Logistic regression are treated as special cases.  
> Some of the students don't see any benefit from tackling the GLM's
new
> notation/terminology.
McCullagh and Nelder (1989) has some I believe, eg gamma models.  Also 
quasi-likelihood models, such as the Wedderburn (1974) approach to 
analysis of 2-component compositional data (the leaf blotch example in 
McC&N).

On the more general point: yes, if all that students need to know is 
OLS, Poisson rate models and logistic regression, then GLM is overkill. 
  The point, surely, is that GLM opens up a way of thinking in which 
mean function and variance function are specified separately?  This 
becomes most clear through a presentation of GLMs via quasi-likelihood 
(as a the "right" generalization of weighted least squares) rather
than
via the exponential-family likelihoods.  In my opinion.
>
> 4. Is it possible to find all methods that an object inherits?
>
> I found out by reading the source code for J Fox's car package that 
> model.matrix() returns the X matrix of coded input variables, so one 
> can do fun things like calculate robust standard errors and such. 
> That's really useful, because before I found that, I was recoding up a 
> storm to re-create the X matrix used in a model.
>
> Is there a direct way to find a list of all the other methods that 
> would apply to an object?
methods(class="glm")
methods(class="lm")

is probably not as direct as you had in mind!  But it's a start.

Best wishes,
David

Rolf Turner

2004-Mar-16 15:15 UTC

head link

[R] glm questions

David Firth wrote (in response to a question from Paul Johnson):
> On the more general point: yes, if all that students need to know is 
> OLS, Poisson rate models and logistic regression, then GLM is overkill. 
	I couldn't agree less.  The glm (not GLM!) framework gives a
	coherence to the structure and changes a collection of ad hoc
	(and thereby essentially meaningless cook-book) techniques
	into a single meaningful technique:

	A parameter (the mean) of a distribution is a transformation
	of a linear function of some predictors.  One seeks to
	estimate the linear coefficients via maximum likelihood.  In
	a broad array of circumstances the maximization can be
	carried out by the glm() function (using iteratively
	reweighted least squares).  The process is quick and
	efficient and the notation is about as transparent as can be
	imagined.

					cheers,

						Rolf Turner
						rolf at math.unb.ca

Reasonably Related Threads

Search for more seemingly similar threads

R help - Mar 2004 - glm questions

[R] glm questions

[R] glm questions

[R] glm questions

Reasonably Related Threads