Smith, Desmond
2018-Oct-19 21:57 UTC
[R] Fixed effects in negative binomial mixed model mgcv::gam
I am using gam from the mgcv package to analyze a dataset with 24 entries : ran f1 f2 y 1 3000 5 545 1 3000 10 1045 1 10000 5 536 1 10000 10 770 2 3000 5 842 2 3000 10 2042 2 10000 5 615 2 10000 10 1361 3 3000 5 328 3 3000 10 1028 3 10000 5 262 3 10000 10 722 4 3000 5 349 4 3000 10 665 4 10000 5 255 4 10000 10 470 5 3000 5 680 5 3000 10 1510 5 10000 5 499 5 10000 10 1422 6 3000 5 628 6 3000 10 2062 6 10000 5 499 6 10000 10 2158 The data has two fixed effects (f1 and f2) and one random effect (ran). The dependent data is y. Because the dependent data y represents counts and is overdispersed, I am using a negative binomial model. The gam model and its summary output is as follows: library(mgcv) summary(gam(y ~ f1 * f2 + s(ran, bs = "re"), data = df2, family = nb, method = "REML")) Family: Negative Binomial(27.376) Link function: log Formula: y ~ f1 * f2 + s(ran, bs = "re") Parametric coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 5.500e+00 3.137e-01 17.533 < 2e-16 *** f1 -3.421e-05 3.619e-05 -0.945 0.345 f2 1.760e-01 3.355e-02 5.247 1.55e-07 *** f1:f2 2.665e-07 4.554e-06 0.059 0.953 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Approximate significance of smooth terms: edf Ref.df Chi.sq p-value s(ran) 4.726 5 85.66 <2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 R-sq.(adj) = 0.866 Deviance explained = 93.6% -REML = 185.96 Scale est. = 1 n = 24 The Wald test from summary gives very high significance for f2 (P = 1.55e-07). However, when I test the significance of f2 by comparing two different models using anova, I get dramatically different results: anova(gam(y ~ f1 * f2 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"), gam(y ~ f1 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"), test="Chisq") Analysis of Deviance Table Model 1: y ~ f1 * f2 + s(ran, bs = "re") Model 2: y ~ f1 + s(ran, bs = "re") Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 14.843 18.340 2 16.652 21.529 -1.8091 -3.188 0.1752 f2 is no longer significant. The models were changed from REML to ML, as recommended for evaluation of fixed effects. If the interaction is preserved, f2 still remains insignificant using anova: anova(gam(y ~ f1 + f2 + f1:f2 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"), gam(y ~ f1 + f1:f2 + s(ran, bs = "re"), data = df2, family = nb, method = "ML"), test="Chisq") Analysis of Deviance Table Model 1: y ~ f1 + f2 + f1:f2 + s(ran, bs = "re") Model 2: y ~ f1 + f1:f2 + s(ran, bs = "re") Resid. Df Resid. Dev Df Deviance Pr(>Chi) 1 14.843 18.340 2 15.645 19.194 -0.80159 -0.85391 0.2855 I would be very grateful for advice on which of these approaches is most appropriate. Many thanks! ________________________________ UCLA HEALTH SCIENCES IMPORTANT WARNING: This email (and any attachments) is only intended for the use of the person or entity to which it is addressed, and may contain information that is privileged and confidential. You, the recipient, are obligated to maintain it in a safe, secure and confidential manner. Unauthorized redisclosure or failure to maintain confidentiality may subject you to federal and state penalties. If you are not the intended recipient, please immediately notify us by return email, and delete this message from your computer. [[alternative HTML version deleted]]