Mark Difford
2007-Jan-09 11:13 UTC
[R] contingency table analysis; generalized linear model
Dear List, I would appreciate help on the following matter: I am aware that higher dimensional contingency tables can be analysed using either log-linear models or as a poisson regression using a generalized linear model: log-linear: loglm(~Age+Site, data=xtabs(~Age+Site, data=SSites.Rev, drop.unused.levels=T)) GLM: glm.table <- as.data.frame(xtabs(~Age+Site, data=SSites.Rev, drop.unused.levels=T)) glm(Freq ~ Age + Site, data=glm.table, family='poisson') where Site is a factor and Age is cast as a factor by xtabs() and treated as such. **Question**: Is it acceptable to step away from contingency table analysis by recasting Age as a numerical variable, and redoing the analysis as: glm(Freq ~ as.numeric(Age) + Site, data=glm.table, family='poisson') My reasons for wanting to do this are to be able to include non-linear terms in the model, using say restricted or natural cubic splines. Thank you in advance for your help. Regards, Mark Difford. --------------------------------------------------------------- Mark Difford Ph.D. candidate, Botany Department, Nelson Mandela Metropolitan University, Port Elizabeth, SA.
Trevor Hastie
2007-Jan-10 15:06 UTC
[R] contingency table analysis; generalized linear model
> Date: Tue, 9 Jan 2007 11:13:41 +0000 (GMT) > From: Mark Difford <mark_difford@yahoo.co.uk> > Subject: Re: [R] contingency table analysis; generalized linear model > > Dear List, > > I would appreciate help on the following matter: > > I am aware that higher dimensional contingency tables can be > analysed using either log-linear models or as a poisson regression > using a generalized linear model: > > log-linear: > loglm(~Age+Site, data=xtabs(~Age+Site, data=SSites.Rev, > drop.unused.levels=T)) > > GLM: > glm.table <- as.data.frame(xtabs(~Age+Site, data=SSites.Rev, > drop.unused.levels=T)) > glm(Freq ~ Age + Site, data=glm.table, family='poisson') > > where Site is a factor and Age is cast as a factor by xtabs() and > treated as such. > > **Question**: > Is it acceptable to step away from contingency table analysis by > recasting Age as a numerical variable, and redoing the analysis as: > > glm(Freq ~ as.numeric(Age) + Site, data=glm.table, family='poisson') > > My reasons for wanting to do this are to be able to include non- > linear terms in the model, using say restricted or natural cubic > splines. > > Thank you in advance for your help. > Regards, > Mark Difford. > > > --------------------------------------------------------------- > Mark Difford > Ph.D. candidate, Botany Department, > Nelson Mandela Metropolitan University, > Port Elizabeth, SA.Yes it is, and it is often the preferred way to view the analysis. In this case it looks like Freq is measuring something like species abundance, and it is natural to model this as a Poisson count via a log-link glm. As such you are free to include any reasonable functions of your predictors in modeling the mean. Log-linear models are typically presented as ways of analyzing dependence between categorical variables, when represented as multi-way tables. The appropriate multinomial models, conditioning on certain marginals, happen to be equivalent to Poisson glms with appropriate terms included. I would suggest in your data preparation that you glm.table[,"Age"] <- as.numeric(glm.table[,"Age"]) at the start, so that now you can think of your data in the right way. Trevor Hastie ------------------------------------------------------------------- Trevor Hastie hastie@stanford.edu Professor & Chair, Department of Statistics, Stanford University Phone: (650) 725-2231 (Statistics) Fax: (650) 725-8977 (650) 498-5233 (Biostatistics) Fax: (650) 725-6951 URL: http://www-stat.stanford.edu/~hastie address: room 104, Department of Statistics, Sequoia Hall 390 Serra Mall, Stanford University, CA 94305-4065 -------------------------------------------------------------------- [[alternative HTML version deleted]]
Reasonably Related Threads
- RuleFit & quantreg: partial dependence plots; showing an effect
- sparklines in lattice
- Graphical option to update.packages in development version (build of the 2011-07-31 r56569) for Windows not working properly
- PCA on high dimentional data
- Fw: nested linear model; with common intercept