Jessica S Veysey
2009-Jan-07 21:22 UTC
[R] fixed effect significance_NB mixed models_further pursuit
7 Jan 09 Hello, I am using R version 2.7.0 in a Windows XP context. I am also using the glmm.admb package (created by Dave Fournier, Hans Skaug, and Anders Nielson) to run mixed-effects negative binomial models. To the best of my knowledge and ability, I have searched and studied the R-help, R-sig-mixed models, and ADMB NBMM for R (through Otter Research Ltd) list servs; R help documentation; and the web in general for information pertaining to the use and interpretation of the glmm.admb package. My models are able to run without problems. I would, however, like to be able to state whether each fixed-effect predictor used in my regression models is ?significant,? in the hypothesis-testing-sense of ?significance.? I am seeking advice on how best to assess the significance / importance of individual predictors in mixed-effects negative binomial regressions. From what I gather, there is no consensus on how best to go about this; although there are two main lines of thought (at least as would apply to someone in my case with limited statistical and programming experience): 1) Log-likelihood ratio tests (LLRT), comparing the saturated model to reduced models (wherein each predictor is dropped in turn); and 2) Wald tests, using parameter estimates and parameter standard errors to calculate the Wald statistic. I understand there are problems inherent to each approach. Following the logic of D. Bates (R-sig-mixed models list serv: 13 Oct 08), I decided to pursue the LLRT approach. Still, I have several questions. If this is my model: m <- glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2, random = ~ 1, group = ?subject?, family = ?nbinom?, data = dataset) and b1 = a categorical variable with 3 levels b2 = a continuous variable and I?ve set my contrasts as follows: options(contrasts =c("contr.treatment", "contr.poly")) dataset$b1 <- factor (as.character(dataset$b1), levels=c("L1", "L2", "L3")) My questions are: 1)Is it possible to test the significance of the individual predictors, for those predictors that are also used in the interaction term (i.e., for b1 and b2, individually, not just the b1:b2 interaction)? Using the LLRT approach, when I drop one of these individual predictors (e.g., b2) without also dropping its interaction term (i.e., b1:b2), I obtain a reduced model with a loglikehood estimate that is equal to the loglikehood estimate of the saturated model. A LLRT between this reduced and saturated model has 0 degrees of freedom (because the same number of parameters is estimated for both the reduced and the saturated model); as in the output below: Model 1: glmm.admb(fixed = y ~ b1 + I(b2^2) + b1:b2? Model 2: glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2?. NoPar LogLik Df -2logQ P.value 1 10.00 -190.66 2 10.00 -190.66 0 0.00 0.00 I thought this outcome might arise because the program might automatically implement a ?heredity? rule (see R-sig-mixed models list serv: A. Robinson, 1 Mar. 07), whereby if an interaction is included in a model, all lesser components of the interaction are also automatically included in the model. A summary of the fixed effects of the reduced model shows that there is no such heredity rule in action (i.e., there is no parameter estimate for b2 alone): Formula: y ~ b1 + I(b2^2) + b1:b2 (Intercept) b1L2 8.9272e-01 -1.5081e+00 b1L3 I(b2^2) -6.3097e-01 -7.2719e-05 b1L1:b2 b1L2:b2 2.1573e-02 2.5767e-02 b1L3:b2 2.3130e-02 Ultimately, is it possible to test and discuss the significance of these individual predictors? (If so, how?) Or is it really only practical to discuss the significance of the interaction (and if the interaction is not significant to remove the interaction from the model, and then retest the significance of the individual predictors)? 2) In a similar vein, if I use a LLRT to compare a reduced model with the squared term (i.e., I(b2^2)) removed, to the saturated model, I obtain different LL values for the two models, but 0 degrees of freedom for the test (because the same number of parameters are being estimated in the two models), as shown below. Model 1: glmm.admb(fixed = y ~ b1 + b2 + b1:b2? Model 2: glmm.admb(fixed = y ~ b1 + b2 + I(b2^2) + b1:b2?. NoPar LogLik Df -2logQ P.value 1 10.00 -191.22 2 10.00 -190.66 0 1.13 0.00 Is there an alternative / better way to test for the significance of the squared term (i.e., other than an LLRT as I have set it up?) 3) How can I assess the significance of contrasts between different levels of the categorical predictor (i.e., b1)? I am used to the summary.glme command in S+ 8.0, which generates: parameter estimates, parameter standard errors, t values, and p-values (describing the significance of the t values) for each level of a categorical predictor, compared to the reference level for that predictor. In a manner similar to summary.glme, would it be appropriate to conduct a Wald test of parameter-estimate significance for each level of the categorical predictor, compared to the reference level? If not, is there a different approach I could use? If it would be appropriate, how could I generate the parameter standard errors (which are necessary in order to calculate the Wald statistics)? This question of how to calculate fixed-effect parameter standard errors for glmm.admb-generated models, has been posed, but only partially answered, several times on the aforementioned list servs (e.g., R-help list serv: H. Skaug, 25 Feb 06). I know that, using the glmm.admb package, I could call: m$stdbeta to obtain the standard deviations of the fixed effect estimates. However, this is insufficient for calculating Wald statistics or even for generating confidence intervals, as both require standard errors, not standard deviations. Cameron and Trivedi (Regression Analysis of Count Data; 1998, p 62-69), provide a formula for easily calculating negative binomial parameter standard errors from maximum likelihood Hessian standard errors (i.e., the standard errors produced via a Poisson regression, assuming equidispersion), but since these are not automatically produced in the glmm.admb output, this is of little help to me. Any help that could be provided with respect to any of my questions would be greatly appreciated. I apologize in advance for the length of this post, but wanted to include all of my questions ? since they are interrelated. I can easily provide additional information related to my examples, if needed. Thank you for your help, J.S. Veysey Doctoral Student University of New Hampshire Durham, NH USA