Ben Kenward
2010-Nov-30 09:16 UTC
[R] researcher with highly skewed data set seeks help finding practical GLMM tutorial
Hi! I am a psychologist who suspects that the only sensible way to analyse a particular data set is to use generalised linear mixed models. I am hoping that someone might be able to point me in the right direction to find some very practical hands on documentation that might be able to talk me through actually doing such an analysis? So far in my searches the most useful document I have turned up is Bolker et al. (2008, TREE) Generalized linear mixed models: a practical guide for ecology and evolution. As a general guide it doesn't give enough practical information about how to get the job done. The R documentation is obviously practical, but doesn't help to decide what kind of analysis is appropriate. Apart from those sources I am mainly finding quite theoretical treatments going over my head, for example: http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/reviewr.pdf. I am moderately competent programming in R, having coded custom permutation tests before (which in contrast to GLMM I find intutive). In case anyone is kind enough to give me any specific pointers, here is the nature of my data set. With an N of 42 subjects, I have a highly left skewed (about half the data points are zero) frequency variable as dependent variable. This variable is measured in each subject in three different task types. There is furthermore a context variable with two levels. Each task was administered in each context, but not for every single subject. So the design is quite simple - two fixed factors (task and context), one random factor (subject), and an untransformably skewed dependent variable. I might want to add some additional fixed factors (age group) in future but for now I would like to keep it simple. I guess this is straightforward for those in the know. Any help at all much appreciated! Cheers, Ben -- Dr. Ben Kenward Department of Psychology, Uppsala University, Sweden http://www.benkenward.com
Achim Zeileis
2010-Nov-30 09:59 UTC
[R] researcher with highly skewed data set seeks help finding practical GLMM tutorial
On Tue, 30 Nov 2010, Ben Kenward wrote:> Hi! > > I am a psychologist who suspects that the only sensible way to analyse > a particular data set is to use generalised linear mixed models. I am > hoping that someone might be able to point me in the right direction > to find some very practical hands on documentation that might be able > to talk me through actually doing such an analysis? > > So far in my searches the most useful document I have turned up is > Bolker et al. (2008, TREE) Generalized linear mixed models: a > practical guide for ecology and evolution. As a general guide it > doesn't give enough practical information about how to get the job > done. The R documentation is obviously practical, but doesn't help to > decide what kind of analysis is appropriate. Apart from those sources > I am mainly finding quite theoretical treatments going over my head, > for example: http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-software/reviewr.pdf. > > I am moderately competent programming in R, having coded custom > permutation tests before (which in contrast to GLMM I find intutive). > > In case anyone is kind enough to give me any specific pointers, here > is the nature of my data set. With an N of 42 subjects, I have a > highly left skewed (about half the data points are zero) frequency > variable as dependent variable. This variable is measured in each > subject in three different task types. There is furthermore a context > variable with two levels. Each task was administered in each context, > but not for every single subject. > > So the design is quite simple - two fixed factors (task and context), > one random factor (subject), and an untransformably skewed dependent > variable. I might want to add some additional fixed factors (age > group) in future but for now I would like to keep it simple. I guess > this is straightforward for those in the know. Any help at all much > appreciated!Given that you have frequency data with many zeros, some zero-augmented count data model might be useful. For example a hurdle model or a zero-inflated Poisson or negative binomial model. Both lead often to similar fits but the hurdle model is typically easier to interpret. An overview using the "pscl" package is given in http://www.jstatsoft.org/v27/i08/ This implementation currently does not support random effects though. But for a start a hurdle() model with sandwich standard errors should be useful to find out whether this type of model is useful for your data. If so, you might also want to have a look at the "gamlss" package that suports a somewhat different implementation of ZIP models but has random effects. See http://www.jstatsoft.org/v23/i07/ hth, Z> Cheers, > > Ben > > -- > Dr. Ben Kenward > Department of Psychology, Uppsala University, Sweden > http://www.benkenward.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Reasonably Related Threads
- accented characters in filenames mangled when rsyncing to a samba share
- plot scale
- SAS to R: I would like to replicate a statistical analysis performed in SAS in R.
- Question regarding lmer vs glmmPQL vs glmm.admb model on a negative binomial distributed dependent variable
- Upgrading