I''m fitting generalized linear mixed models to using several fixed effects (main effects and a couple of interactions) and a grouping factor (site) to explain the variation in a dichotomous response variable (family=binomial). I wanted to compare the output I obtained using PROC GLIMMIX in SAS with that obtained using lmer in R (version 2.6.1 in Windows). When using lmer I''m specifying method="PQL" so as to make the estimation method comparable between lmer and GLIMMIX. It is difficult to compare the outputs for the interaction terms because SAS gives the estimates and significance value for each of the categories, whereas R gives a single estimate for the interaction term. But, from the main effects it is possible to see very similar estimates obtained with either program. I am very interested in the interaction term SEX*ELI, and this term comes up as significant in SAS and nonsignificant in R. Why could this be? It is very worrisome to think of reporting a significant result that is not validated when doing a similar analysis using a different program! Can somebody help me interpret these differences? Bellow is a summary of the outputs obtained with R and SAS. Thanks, Andrea Previtali Post-doc fellow Dept. of Biology, Univ. of Utah. lmer output: Generalized linear mixed model fit using PQL Formula: SURV ~ SEX * ELI + DW * DIST + SEAS + DEN + WT + (1 | SITE) Family: binomial(logit link) AIC BIC logLik deviance 1539 1606 -758.7 1517 Random effects: Groups Name Variance Std.Dev. SITE (Intercept) 0.27816 0.52741 number of obs: 3104, groups: SITE, 19 Estimated scale (compare to 1 ) 0.9458749 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.144259 0.458672 -2.495 0.012606 * SEX -0.606026 0.167289 -3.623 0.000292 *** ELI -0.190757 0.219599 -0.869 0.385034 DW -0.328796 0.175882 -1.869 0.061565 . DIST -0.117745 0.374148 -0.315 0.752989 SEAS -0.784971 0.158748 -4.945 7.62e-07 *** DEN -0.013381 0.002585 -5.176 2.27e-07 *** WT 0.007735 0.019115 0.405 0.685732 SEX:ELI -0.466425 0.461596 -1.010 0.312274 DW:DIST -1.015454 0.404683 -2.509 0.012099 * ----------------------------------------------------------------------------------- GLIMMIX output: Model Information Variance Matrix Blocked By Site Estimation Technique: Residual PL Degrees of Freedom Method: Containment Fit Statistics -2 Res Log Pseudo-Likelihood: 17868.73 Pseudo-AIC: 17890.73 Pseudo-BIC: 17957.14 Covariance Parameter Estimates Cov Parm Subject Estimate Std Error Intercept Site 0.2975 0.1799 Solutions for Fixed Effects Effect DIST DW ELI SEX SEAS Estimate Std Error DF t Value Pr> |t| Intercept -4.6540 0.6878 17 -6.77 F DIST*DW 3 3077 6.06 0.0004 SEX*ELI 3 3077 6.30 0.0003 WT 1 3077 0.16 0.6918 SEAS 1 3077 24.37

Sorry, I realized that somehow the message got truncated. Here is the remaining part of the SAS output: Solutions for Fixed Effects: Effect DIST DW ELI SEX SEAS Estimate Std. Error DF t Value Pr > |t| Intercept -4.6540 0.6878 17 -6.77 <.0001 DIST*DW 0 0 1.4641 0.4115 3077 3.56 0.0004 DIST*DW 0 1 1.1333 0.4028 3077 2.81 0.0049 DIST*DW 1 0 1.3456 0.3745 3077 3.59 0.0003 DIST*DW 1 1 0 . . . . SEX*ELI 0 0 1.2633 0.4155 3077 3.04 0.0024 SEX*ELI 0 1 0.6569 0.4140 3077 1.59 0.1126 SEX*ELI 1 0 1.0728 0.4364 3077 2.46 0.0140 SEX*ELI 1 1 0 . . . . WT 0.00758 0.01912 3077 0.40 0.6918 SEAS 0 0.7839 0.1588 3077 4.94 <.0001 SEAS 1 0 . . . . DEN -0.01343 0.002588 3077 -5.19 <.0001 Type III Tests of Fixed Effects Effect NUM.DF DEN.DF F Value Pr > F DIST*DW 3 3077 6.06 0.0004 SEX*ELI 3 3077 6.30 0.0003 WT 1 3077 0.16 0.6918 SEAS 1 3077 24.37 <.0001 DEN 1 3077 26.94 <.0001 -- View this message in context: http://www.nabble.com/GLMMs-fitted-with-lmer-%28R%29---glimmix-%28SAS%29-tp14623190p14627716.html Sent from the R help mailing list archive at Nabble.com.

On Jan 4, 2008 6:21 PM, Andrea Previtali <aprevitali at hotmail.com> wrote:> > Sorry, I realized that somehow the message got truncated. Here is the > remaining part of the SAS output: > > Solutions for Fixed Effects: > > Effect DIST DW ELI SEX SEAS Estimate Std. Error > DF t Value Pr > |t| > > Intercept -4.6540 > 0.6878 17 -6.77 <.0001 > > DIST*DW 0 0 1.4641 0.4115 3077 > 3.56 0.0004 > DIST*DW 0 1 1.1333 0.4028 3077 > 2.81 0.0049 > DIST*DW 1 0 1.3456 0.3745 > 3077 3.59 0.0003 > DIST*DW 1 1 0 . . > . . > > SEX*ELI 0 0 1.2633 0.4155 > 3077 3.04 0.0024 > SEX*ELI 0 1 0.6569 0.4140 3077 1.59 > 0.1126 > SEX*ELI 1 0 1.0728 0.4364 > 3077 2.46 0.0140 > SEX*ELI 1 1 0 . > . . . > > WT 0.00758 0.01912 > 3077 0.40 0.6918 > > SEAS 0 0.7839 0.1588 > 3077 4.94 <.0001 > > SEAS 1 0 . > . . . > > DEN -0.01343 0.002588 > 3077 -5.19 <.0001 > > Type III Tests of Fixed Effects > > Effect NUM.DF DEN.DF F Value > Pr > F > DIST*DW 3 3077 6.06 > 0.0004 > SEX*ELI 3 3077 6.30 > 0.0003 > WT 1 3077 0.16 > 0.6918 > SEAS 1 3077 24.37 > <.0001 > DEN 1 3077 26.94 > <.0001At least on my mail reader the copies of the output ended up with wrapped lines and, apparently, some changes in the spacing. I enclose two text files, glimmix.txt and glmer.txt, that are my reconstructions of the originals. Please let me know if I have not reconstructed them correctly. In particular, i don''t think I got the first table of "Solutions for Fixed Effects:" in the glimmix.txt file correct. It seems to mix t statistics and F statistics in ways that I don''t understand. Another thing I don''t understand is what the Pseudo-Likelihood is. Perhaps it is what I would call the penalized weighted residual sum of squares. The likelihood reported by lmer and based on the binomial distribution is very different. If you want to compare coefficients I suggest using options(contrasts = c("contr.SAS", "contr.poly") and assure that SEX, DIST, DW and ELI are factors, then call lmer. This will ensure that the SEX, DIST, DW and ELI terms and their interactions are represented by contrasts in which the last level is the reference level (the SAS convention) as opposed to the first level (the R convention). Also, you may be confusing the S language formula terms with the SAS formula terms. In R the asterisk denotes crossing of terms and the : is used for an interaction. Thus SEX*ELI is equivalent to SEX + ELI + SEX:ELI in R. In SAS, it is the interaction that is written as SEX*ELI. I suggest that you change your SAS formula to include main effects for SEX, ELI, DIST and DW. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: glmer.txt Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080106/d0a31621/attachment-0004.txt -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: glmmmix.txt Url: https://stat.ethz.ch/pipermail/r-help/attachments/20080106/d0a31621/attachment-0005.txt

Thanks for your responses. I knew that when you include an interaction term in a model you must include the main effects of each of the factors. Therefore, I assumed that SAS will do that by default. In most statistical packages, as in R, the main effects are automatically added when you include an interaction. But after reading your response I became suspicious and, it turns out, that SAS was indeed not including the main effects. To make sure that the main effects are part of the model in SAS you need to specify: sex eli sex*eli or use sex|eli. When I did this the estimates for the interaction terms now match closely those from R and, yes, the interaction between sex and infection status is now non significant. Here is the relevant part of the new SAS output: Model Information Variance Matrix Blocked By Site Estimation Technique: Residual PL Degrees of Freedom Method: Containment Fit Statistics -2 Res Log Pseudo-Likelihood: 17868.73 Pseudo-AIC: 17890.73 Pseudo-BIC: 17957.14 Covariance Parameter Estimates Cov Parm Subject Estimate Std Error Intercept Site 0.2975 0.1799 Solutions for Fixed Effects Effect SEX ELI DW DIST SEA Estimate Std. Error DF t Value Pr > |t| Intercept -1.1427 0.4611 17 -2.48 0.0240 SEX 0 -0.6064 0.1673 3077 -3.62 0.0003 SEX 1 0 . . . . ELI 0 -0.1905 0.2197 3077 -0.87 0.3858 ELI 1 0 . . . . SEX*ELI 0 0 -0.4664 0.4617 3077 -1.01 0.3125 SEX*ELI 0 1 0 . . . . SEX*ELI 1 0 0 . . . . SEX*ELI 1 1 0 . . . . DW 0 -0.3308 0.1760 3077 -1.88 0.0603 DW 1 0 . . . . DIST 0 -0.1185 0.3816 3077 -0.31 0.7562 DIST 1 0 . . . . DW*DIST 0 0 -1.0148 0.4064 3077 -2.50 0.0126 DW*DIST 0 1 0 . . . . DW*DIST 1 0 0 . . . . DW*DIST 1 1 0 . . . . SEAS 0 -0.7839 0.1588 3077 -4.94 <.0001 SEAS 1 0 . . . . DEN -0.01343 0.002588 3077 -5.19 <.0001 WT 0.00758 0.01912 3077 0.40 0.6918 And here is again the R output using lmer fro easy comparison: Generalized linear mixed model fit using PQL Formula: SURV ~ SEX * ELI + DW * DIST + SEAS + DEN + WT + (1 | SITE) Family: binomial(logit link) AIC BIC logLik deviance 1539 1606 -758.7 1517 Random effects: Groups Name Variance Std.Dev. SITE (Intercept) 0.27816 0.52741 number of obs: 3104, groups: SITE, 19 Estimated scale (compare to 1 ) 0.9458749 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.144259 0.458672 -2.495 0.012606 SEX -0.606026 0.167289 -3.623 0.000292 *** ELI -0.190757 0.219599 -0.869 0.385034 DW -0.328796 0.175882 -1.869 0.061565 . DIST -0.117745 0.374148 -0.315 0.752989 SEAS -0.784971 0.158748 -4.945 7.62e-07 *** DEN -0.013381 0.002585 -5.176 2.27e-07 *** WT 0.007735 0.019115 0.405 0.685732 SEX:ELI -0.466425 0.461596 -1.010 0.312274 DW:DIST -1.015454 0.404683 -2.509 0.012099 * Now it is strange that after I fitted the model correctly the estimate for the random factor did not change, as well as the fit statistics. How can this be? Regarding the question of What is the Pseudo-Likelihood that is part of the output in SAS and not in R? The manual for the glimmix procedure (which can be found here http://support.sas.com/rnd/app/da/glimmix.html http://support.sas.com/rnd/app/da/glimmix.html ) says that "For a model containing random effects, the GLIMMIX procedure, by default, estimates the parameters by applying pseudo-likelihood techniques as in Wolfinger and O?Connell (1993) and Breslow and Clayton (1993)." That is, by default SAS uses the PQL method that as David Buffy said, it is just an approximation. The procedure involves a series of optimizations obtained via iterative estimation methods based on linearizations (using Taylor series expansions). After each optimization, a new pseudo-model is constructed for the mean response. All the fit statistics (AIC, BIC, etc) that SAS reports are calculated from the likelihood of the final "pseudo" model, thus the term "pseudo-likelihood." The problem is that then these criteria cannot be used to compare models; which is too bad because, at least in my field, it is preferable to use information theory than p-values based on arbitrary set significance levels (see Anderson Burnham and Thompson, 2000). What are the likelihood values that we obtain when using lmer in R? Can they be used to compare models? -- View this message in context: http://www.nabble.com/GLMMs-fitted-with-lmer-%28R%29---glimmix-%28SAS%29-tp14623190p14764018.html Sent from the R help mailing list archive at Nabble.com.