thr3ads.net - R help - [R] Likelihood ratio test between glm and glmer fits [Jul 2008]

If this information is useful, please help other people find it:
Share via:

COREY SPARKS

2008-Jul-16 16:08 UTC

[R] Likelihood ratio test between glm and glmer fits

Dear list,
I am fitting a logistic multi-level regression model and need to test the
difference between the ordinary logistic regression from a glm() fit and the
mixed effects fit from glmer(), basically I want to do a likelihood ratio test
between the two fits.


The data are like this:
My outcome is a (1,0) for health status, I have several (1,0) dummy variables
RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male, divorced, SINGLE,
chronic, vigor_d and moderat_d and AGE is continuous (20 to 100).
My higher level is called munid and has 581 levels.
The data have 45243 observations.

Here are my program statements:

#GLM fit
ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
data=mx.merge)
#GLMER fit
ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
data=mx.merge)

I cannot find a method in R that will do the LR test between a glm and a glmer
fit, so I try to do it using the liklihoods from both models

#form the likelihood ratio test between the glm and glmer fits
x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
 >     ML 79.60454 
attr(,"nobs")
    n 
45243 
attr(,"nall")
    n 
45243 
attr(,"df")
[1] 14
attr(,"REML")
[1] FALSE
attr(,"class")
[1] "logLik"

#Get the associated p-value
dchisq(x2,14)
         ML >5.94849e-15 
Which looks like an improvement in model fit to me.  Am I seeing this correctly
or are the two models even able to be compared? they are both estimated via
maximum likelihood, so they should be, I think.
Any help would be appreciated.

Corey

Corey S. Sparks, Ph.D.

Assistant Professor 
Department of Demography and Organization Studies
University of Texas San Antonio
One UTSA Circle 
San Antonio, TX 78249
email:corey.sparks@utsa.edu
web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm


	[[alternative HTML version deleted]]

Dimitris Rizopoulos

2008-Jul-16 18:22 UTC

head link

[R] Likelihood ratio test between glm and glmer fits

well, for computing the p-value you need to use pchisq() and dchisq()  
(check ?dchisq for more info). For model fits with a logLik method you  
can directly use the following simple function:

lrt <- function (obj1, obj2) {
     L0 <- logLik(obj1)
     L1 <- logLik(obj2)
     L01 <- as.vector(- 2 * (L0 - L1))
     df <- attr(L1, "df") - attr(L0, "df")
     list(L01 = L01, df = df,
         "p-value" = pchisq(L01, df, lower.tail = FALSE))
}

library(lme4)
gm0 <- glm(cbind(incidence, size - incidence) ~ period,
               family = binomial, data = cbpp)
gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
               family = binomial, data = cbpp)

lrt(gm0, gm1)


However, there are some issues regarding this likelihood ratio test.

1) The null hypothesis is on the boundary of the parameter space,  
i.e., you test whether the variance for the random effect is zero. For  
this case the assumed chi-squared distribution for the LRT may *not*  
be totally appropriate and may produce conservative p-values. There is  
some theory regarding this issue, which has shown that the reference  
distribution for the LRT in this case is a mixture of a chi-squared(df  
= 0) and chi-squared(df = 1). Another option is to use  
simulation-based approach where you can approximate the reference  
distribution of the LRT under the null using simulation. You may check  
below for an illustration of this procedure (not-tested):

X <- model.matrix(gm0)
coefs <- coef(gm0)
pr <- plogis(c(X %*% coefs))
n <- length(pr)
new.dat <- cbpp
Tobs <- lrt(gm0, gm1)$L01
B <- 200
out.T <- numeric(B)
for (b in 1:B) {
     y <- rbinom(n, cbpp$size, pr)
     new.dat$incidence <- y
     fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
     fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
     out.T[b] <- lrt(fit0, fit1)$L01
}
# estimate p-value
(sum(out.T >= Tobs) + 1) / (B + 1)


2) For the glmer fit you have to note that you work with an  
approximation to the log-likelihood (obtained using numerical  
integration) and not the actual log-likelihood.

I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
      http://www.student.kuleuven.be/~m0390867/dimitris.htm


Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:
> Dear list,
> I am fitting a logistic multi-level regression model and need to   
> test the difference between the ordinary logistic regression from a   
> glm() fit and the mixed effects fit from glmer(), basically I want   
> to do a likelihood ratio test between the two fits.
>
>
> The data are like this:
> My outcome is a (1,0) for health status, I have several (1,0) dummy   
> variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,   
> divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is   
> continuous (20 to 100).
> My higher level is called munid and has 581 levels.
> The data have 45243 observations.
>
> Here are my program statements:
>
> #GLM fit
>
ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
> data=mx.merge)
> #GLMER fit
>
ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
> data=mx.merge)
>
> I cannot find a method in R that will do the LR test between a glm   
> and a glmer fit, so I try to do it using the liklihoods from both   
> models
>
> #form the likelihood ratio test between the glm and glmer fits
> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>
>>     ML
> 79.60454
> attr(,"nobs")
>     n
> 45243
> attr(,"nall")
>     n
> 45243
> attr(,"df")
> [1] 14
> attr(,"REML")
> [1] FALSE
> attr(,"class")
> [1] "logLik"
>
> #Get the associated p-value
> dchisq(x2,14)
>          ML
>> 5.94849e-15
>
> Which looks like an improvement in model fit to me.  Am I seeing   
> this correctly or are the two models even able to be compared? they   
> are both estimated via maximum likelihood, so they should be, I think.
> Any help would be appreciated.
>
> Corey
>
> Corey S. Sparks, Ph.D.
>
> Assistant Professor
> Department of Demography and Organization Studies
> University of Texas San Antonio
> One UTSA Circle
> San Antonio, TX 78249
> email:corey.sparks at utsa.edu
> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Rune Haubo

2008-Jul-17 07:50 UTC

head link

[R] Likelihood ratio test between glm and glmer fits

2008/7/16 Dimitris Rizopoulos <Dimitris.Rizopoulos at
med.kuleuven.be>:> well, for computing the p-value you need to use pchisq() and dchisq()
(check
> ?dchisq for more info). For model fits with a logLik method you can
directly
> use the following simple function:
>
> lrt <- function (obj1, obj2) {
>    L0 <- logLik(obj1)
>    L1 <- logLik(obj2)
>    L01 <- as.vector(- 2 * (L0 - L1))
>    df <- attr(L1, "df") - attr(L0, "df")
>    list(L01 = L01, df = df,
>        "p-value" = pchisq(L01, df, lower.tail = FALSE))
> }
>
> library(lme4)
> gm0 <- glm(cbind(incidence, size - incidence) ~ period,
>              family = binomial, data = cbpp)
> gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
>              family = binomial, data = cbpp)
>
> lrt(gm0, gm1)
Yes, that seems quite natural, but then try to compare with the deviance:

logLik(gm0)
logLik(gm1)

(d0 <- deviance(gm0))
(d1 <- deviance(gm1))
(LR <- d0 - d1)
pchisq(LR, 1, lower = FALSE)

Obviously the deviance in glm is *not* twice the negative
log-likelihood as it is in glmer. The question remains which of these
two quantities is appropriate for comparison. I am not sure exactly
how the deviance and/or log-likelihood are calculated in glmer, but my
feeling is that one should trust the deviance rather than the
log-likelihoods for these purposes. This is supported by the following
comparison: Ad an arbitrary random effect with a close-to-zero
variance and note the deviance:

tmp <- rep(1:4, each = nrow(cbpp)/4)
gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | tmp),
             family = binomial, data = cbpp)
(d2 <- deviance(gm2))

This deviance is very close to that obtained from the glm model.

I have included the mixed-models mailing list in the hope that someone
could explain how the deviance is computed in glmer and why deviances,
but not likelihoods are comparable to glm-fits.

Best
Rune
>
>
> However, there are some issues regarding this likelihood ratio test.
>
> 1) The null hypothesis is on the boundary of the parameter space, i.e., you
> test whether the variance for the random effect is zero. For this case the
> assumed chi-squared distribution for the LRT may *not* be totally
> appropriate and may produce conservative p-values. There is some theory
> regarding this issue, which has shown that the reference distribution for
> the LRT in this case is a mixture of a chi-squared(df = 0) and
> chi-squared(df = 1). Another option is to use simulation-based approach
> where you can approximate the reference distribution of the LRT under the
> null using simulation. You may check below for an illustration of this
> procedure (not-tested):
>
> X <- model.matrix(gm0)
> coefs <- coef(gm0)
> pr <- plogis(c(X %*% coefs))
> n <- length(pr)
> new.dat <- cbpp
> Tobs <- lrt(gm0, gm1)$L01
> B <- 200
> out.T <- numeric(B)
> for (b in 1:B) {
>    y <- rbinom(n, cbpp$size, pr)
>    new.dat$incidence <- y
>    fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
>    fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
>    out.T[b] <- lrt(fit0, fit1)$L01
> }
> # estimate p-value
> (sum(out.T >= Tobs) + 1) / (B + 1)
>
>
> 2) For the glmer fit you have to note that you work with an approximation
to
> the log-likelihood (obtained using numerical integration) and not the
actual
> log-likelihood.
>
> I hope it helps.
>
> Best,
> Dimitris
>
> ----
> Dimitris Rizopoulos
> Biostatistical Centre
> School of Public Health
> Catholic University of Leuven
>
> Address: Kapucijnenvoer 35, Leuven, Belgium
> Tel: +32/(0)16/336899
> Fax: +32/(0)16/337015
> Web: http://med.kuleuven.be/biostat/
>     http://www.student.kuleuven.be/~m0390867/dimitris.htm
>
>
> Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:
>
>> Dear list,
>> I am fitting a logistic multi-level regression model and need to  test
the
>> difference between the ordinary logistic regression from a  glm() fit
and
>> the mixed effects fit from glmer(), basically I want  to do a
likelihood
>> ratio test between the two fits.
>>
>>
>> The data are like this:
>> My outcome is a (1,0) for health status, I have several (1,0) dummy
>>  variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,
>>  divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is 
continuous (20
>> to 100).
>> My higher level is called munid and has 581 levels.
>> The data have 45243 observations.
>>
>> Here are my program statements:
>>
>> #GLM fit
>>
>>
ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
>>  data=mx.merge)
>> #GLMER fit
>>
>>
ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
>>  data=mx.merge)
>>
>> I cannot find a method in R that will do the LR test between a glm  and
a
>> glmer fit, so I try to do it using the liklihoods from both  models
>>
>> #form the likelihood ratio test between the glm and glmer fits
>> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>>
>>>    ML
>>
>> 79.60454
>> attr(,"nobs")
>>    n
>> 45243
>> attr(,"nall")
>>    n
>> 45243
>> attr(,"df")
>> [1] 14
>> attr(,"REML")
>> [1] FALSE
>> attr(,"class")
>> [1] "logLik"
>>
>> #Get the associated p-value
>> dchisq(x2,14)
>>         ML
>>>
>>> 5.94849e-15
>>
>> Which looks like an improvement in model fit to me.  Am I seeing  this
>> correctly or are the two models even able to be compared? they  are
both
>> estimated via maximum likelihood, so they should be, I think.
>> Any help would be appreciated.
>>
>> Corey
>>
>> Corey S. Sparks, Ph.D.
>>
>> Assistant Professor
>> Department of Demography and Organization Studies
>> University of Texas San Antonio
>> One UTSA Circle
>> San Antonio, TX 78249
>> email:corey.sparks at utsa.edu
>> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Rune Haubo Bojesen Christensen

Master Student, M.Sc. Eng.
Phone: (+45) 30 26 45 54
Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com

DTU Informatics, Section for Statistics
Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark

Douglas Bates

2008-Jul-17 11:29 UTC

head link

[R] Likelihood ratio test between glm and glmer fits

On Thu, Jul 17, 2008 at 2:50 AM, Rune Haubo <rhbc at imm.dtu.dk>
wrote:> 2008/7/16 Dimitris Rizopoulos <Dimitris.Rizopoulos at
med.kuleuven.be>:
>> well, for computing the p-value you need to use pchisq() and dchisq()
(check
>> ?dchisq for more info). For model fits with a logLik method you can
directly
>> use the following simple function:
>>
>> lrt <- function (obj1, obj2) {
>>    L0 <- logLik(obj1)
>>    L1 <- logLik(obj2)
>>    L01 <- as.vector(- 2 * (L0 - L1))
>>    df <- attr(L1, "df") - attr(L0, "df")
>>    list(L01 = L01, df = df,
>>        "p-value" = pchisq(L01, df, lower.tail = FALSE))
>> }
>>
>> library(lme4)
>> gm0 <- glm(cbind(incidence, size - incidence) ~ period,
>>              family = binomial, data = cbpp)
>> gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 |
herd),
>>              family = binomial, data = cbpp)
>>
>> lrt(gm0, gm1)
>
> Yes, that seems quite natural, but then try to compare with the deviance:
>
> logLik(gm0)
> logLik(gm1)
>
> (d0 <- deviance(gm0))
> (d1 <- deviance(gm1))
> (LR <- d0 - d1)
> pchisq(LR, 1, lower = FALSE)
>
> Obviously the deviance in glm is *not* twice the negative
> log-likelihood as it is in glmer. The question remains which of these
> two quantities is appropriate for comparison. I am not sure exactly
> how the deviance and/or log-likelihood are calculated in glmer, but my
> feeling is that one should trust the deviance rather than the
> log-likelihoods for these purposes. This is supported by the following
> comparison: Ad an arbitrary random effect with a close-to-zero
> variance and note the deviance:
>
> tmp <- rep(1:4, each = nrow(cbpp)/4)
> gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | tmp),
>             family = binomial, data = cbpp)
> (d2 <- deviance(gm2))
>
> This deviance is very close to that obtained from the glm model.
>
> I have included the mixed-models mailing list in the hope that someone
> could explain how the deviance is computed in glmer and why deviances,
> but not likelihoods are comparable to glm-fits.
In that example I think the problem may be that I have not yet written
the code to adjust the deviance of the glmer fit for the null
deviance.
>> However, there are some issues regarding this likelihood ratio test.
>>
>> 1) The null hypothesis is on the boundary of the parameter space, i.e.,
you
>> test whether the variance for the random effect is zero. For this case
the
>> assumed chi-squared distribution for the LRT may *not* be totally
>> appropriate and may produce conservative p-values. There is some theory
>> regarding this issue, which has shown that the reference distribution
for
>> the LRT in this case is a mixture of a chi-squared(df = 0) and
>> chi-squared(df = 1). Another option is to use simulation-based approach
>> where you can approximate the reference distribution of the LRT under
the
>> null using simulation. You may check below for an illustration of this
>> procedure (not-tested):
>>
>> X <- model.matrix(gm0)
>> coefs <- coef(gm0)
>> pr <- plogis(c(X %*% coefs))
>> n <- length(pr)
>> new.dat <- cbpp
>> Tobs <- lrt(gm0, gm1)$L01
>> B <- 200
>> out.T <- numeric(B)
>> for (b in 1:B) {
>>    y <- rbinom(n, cbpp$size, pr)
>>    new.dat$incidence <- y
>>    fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
>>    fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
>>    out.T[b] <- lrt(fit0, fit1)$L01
>> }
>> # estimate p-value
>> (sum(out.T >= Tobs) + 1) / (B + 1)
>>
>>
>> 2) For the glmer fit you have to note that you work with an
approximation to
>> the log-likelihood (obtained using numerical integration) and not the
actual
>> log-likelihood.
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>> ----
>> Dimitris Rizopoulos
>> Biostatistical Centre
>> School of Public Health
>> Catholic University of Leuven
>>
>> Address: Kapucijnenvoer 35, Leuven, Belgium
>> Tel: +32/(0)16/336899
>> Fax: +32/(0)16/337015
>> Web: http://med.kuleuven.be/biostat/
>>     http://www.student.kuleuven.be/~m0390867/dimitris.htm
>>
>>
>> Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:
>>
>>> Dear list,
>>> I am fitting a logistic multi-level regression model and need to 
test the
>>> difference between the ordinary logistic regression from a  glm()
fit and
>>> the mixed effects fit from glmer(), basically I want  to do a
likelihood
>>> ratio test between the two fits.
>>>
>>>
>>> The data are like this:
>>> My outcome is a (1,0) for health status, I have several (1,0) dummy
>>>  variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,
>>>  divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is 
continuous (20
>>> to 100).
>>> My higher level is called munid and has 581 levels.
>>> The data have 45243 observations.
>>>
>>> Here are my program statements:
>>>
>>> #GLM fit
>>>
>>>
ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
>>>  data=mx.merge)
>>> #GLMER fit
>>>
>>>
ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
>>>  data=mx.merge)
>>>
>>> I cannot find a method in R that will do the LR test between a glm 
and a
>>> glmer fit, so I try to do it using the liklihoods from both  models
>>>
>>> #form the likelihood ratio test between the glm and glmer fits
>>> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>>>
>>>>    ML
>>>
>>> 79.60454
>>> attr(,"nobs")
>>>    n
>>> 45243
>>> attr(,"nall")
>>>    n
>>> 45243
>>> attr(,"df")
>>> [1] 14
>>> attr(,"REML")
>>> [1] FALSE
>>> attr(,"class")
>>> [1] "logLik"
>>>
>>> #Get the associated p-value
>>> dchisq(x2,14)
>>>         ML
>>>>
>>>> 5.94849e-15
>>>
>>> Which looks like an improvement in model fit to me.  Am I seeing 
this
>>> correctly or are the two models even able to be compared? they  are
both
>>> estimated via maximum likelihood, so they should be, I think.
>>> Any help would be appreciated.
>>>
>>> Corey
>>>
>>> Corey S. Sparks, Ph.D.
>>>
>>> Assistant Professor
>>> Department of Demography and Organization Studies
>>> University of Texas San Antonio
>>> One UTSA Circle
>>> San Antonio, TX 78249
>>> email:corey.sparks at utsa.edu
>>> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Rune Haubo Bojesen Christensen
>
> Master Student, M.Sc. Eng.
> Phone: (+45) 30 26 45 54
> Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com
>
> DTU Informatics, Section for Statistics
> Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Göran Broström

2008-Jul-19 21:51 UTC

head link

[R] Likelihood ratio test between glm and glmer fits

This particular case with a random intercept model can be handled by
glmmML, by bootstrapping the p-value.

Best, G?ran

On Thu, Jul 17, 2008 at 1:29 PM, Douglas Bates <bates at stat.wisc.edu>
wrote:> On Thu, Jul 17, 2008 at 2:50 AM, Rune Haubo <rhbc at imm.dtu.dk>
wrote:
>> 2008/7/16 Dimitris Rizopoulos <Dimitris.Rizopoulos at
med.kuleuven.be>:
>>> well, for computing the p-value you need to use pchisq() and
dchisq() (check
>>> ?dchisq for more info). For model fits with a logLik method you can
directly
>>> use the following simple function:
>>>
>>> lrt <- function (obj1, obj2) {
>>>    L0 <- logLik(obj1)
>>>    L1 <- logLik(obj2)
>>>    L01 <- as.vector(- 2 * (L0 - L1))
>>>    df <- attr(L1, "df") - attr(L0, "df")
>>>    list(L01 = L01, df = df,
>>>        "p-value" = pchisq(L01, df, lower.tail = FALSE))
>>> }
>>>
>>> library(lme4)
>>> gm0 <- glm(cbind(incidence, size - incidence) ~ period,
>>>              family = binomial, data = cbpp)
>>> gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 |
herd),
>>>              family = binomial, data = cbpp)
>>>
>>> lrt(gm0, gm1)
>>
>> Yes, that seems quite natural, but then try to compare with the
deviance:
>>
>> logLik(gm0)
>> logLik(gm1)
>>
>> (d0 <- deviance(gm0))
>> (d1 <- deviance(gm1))
>> (LR <- d0 - d1)
>> pchisq(LR, 1, lower = FALSE)
>>
>> Obviously the deviance in glm is *not* twice the negative
>> log-likelihood as it is in glmer. The question remains which of these
>> two quantities is appropriate for comparison. I am not sure exactly
>> how the deviance and/or log-likelihood are calculated in glmer, but my
>> feeling is that one should trust the deviance rather than the
>> log-likelihoods for these purposes. This is supported by the following
>> comparison: Ad an arbitrary random effect with a close-to-zero
>> variance and note the deviance:
>>
>> tmp <- rep(1:4, each = nrow(cbpp)/4)
>> gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 |
tmp),
>>             family = binomial, data = cbpp)
>> (d2 <- deviance(gm2))
>>
>> This deviance is very close to that obtained from the glm model.
>>
>> I have included the mixed-models mailing list in the hope that someone
>> could explain how the deviance is computed in glmer and why deviances,
>> but not likelihoods are comparable to glm-fits.
>
> In that example I think the problem may be that I have not yet written
> the code to adjust the deviance of the glmer fit for the null
> deviance.
>
>>> However, there are some issues regarding this likelihood ratio
test.
>>>
>>> 1) The null hypothesis is on the boundary of the parameter space,
i.e., you
>>> test whether the variance for the random effect is zero. For this
case the
>>> assumed chi-squared distribution for the LRT may *not* be totally
>>> appropriate and may produce conservative p-values. There is some
theory
>>> regarding this issue, which has shown that the reference
distribution for
>>> the LRT in this case is a mixture of a chi-squared(df = 0) and
>>> chi-squared(df = 1). Another option is to use simulation-based
approach
>>> where you can approximate the reference distribution of the LRT
under the
>>> null using simulation. You may check below for an illustration of
this
>>> procedure (not-tested):
>>>
>>> X <- model.matrix(gm0)
>>> coefs <- coef(gm0)
>>> pr <- plogis(c(X %*% coefs))
>>> n <- length(pr)
>>> new.dat <- cbpp
>>> Tobs <- lrt(gm0, gm1)$L01
>>> B <- 200
>>> out.T <- numeric(B)
>>> for (b in 1:B) {
>>>    y <- rbinom(n, cbpp$size, pr)
>>>    new.dat$incidence <- y
>>>    fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
>>>    fit1 <- glmer(formula(gm1), family = binomial, data =
new.dat)
>>>    out.T[b] <- lrt(fit0, fit1)$L01
>>> }
>>> # estimate p-value
>>> (sum(out.T >= Tobs) + 1) / (B + 1)
>>>
>>>
>>> 2) For the glmer fit you have to note that you work with an
approximation to
>>> the log-likelihood (obtained using numerical integration) and not
the actual
>>> log-likelihood.
>>>
>>> I hope it helps.
>>>
>>> Best,
>>> Dimitris
>>>
>>> ----
>>> Dimitris Rizopoulos
>>> Biostatistical Centre
>>> School of Public Health
>>> Catholic University of Leuven
>>>
>>> Address: Kapucijnenvoer 35, Leuven, Belgium
>>> Tel: +32/(0)16/336899
>>> Fax: +32/(0)16/337015
>>> Web: http://med.kuleuven.be/biostat/
>>>     http://www.student.kuleuven.be/~m0390867/dimitris.htm
>>>
>>>
>>> Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:
>>>
>>>> Dear list,
>>>> I am fitting a logistic multi-level regression model and need
to  test the
>>>> difference between the ordinary logistic regression from a 
glm() fit and
>>>> the mixed effects fit from glmer(), basically I want  to do a
likelihood
>>>> ratio test between the two fits.
>>>>
>>>>
>>>> The data are like this:
>>>> My outcome is a (1,0) for health status, I have several (1,0)
dummy
>>>>  variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG,
male,
>>>>  divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is 
continuous (20
>>>> to 100).
>>>> My higher level is called munid and has 581 levels.
>>>> The data have 45243 observations.
>>>>
>>>> Here are my program statements:
>>>>
>>>> #GLM fit
>>>>
>>>>
ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
>>>>  data=mx.merge)
>>>> #GLMER fit
>>>>
>>>>
ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
>>>>  data=mx.merge)
>>>>
>>>> I cannot find a method in R that will do the LR test between a
glm  and a
>>>> glmer fit, so I try to do it using the liklihoods from both 
models
>>>>
>>>> #form the likelihood ratio test between the glm and glmer fits
>>>> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>>>>
>>>>>    ML
>>>>
>>>> 79.60454
>>>> attr(,"nobs")
>>>>    n
>>>> 45243
>>>> attr(,"nall")
>>>>    n
>>>> 45243
>>>> attr(,"df")
>>>> [1] 14
>>>> attr(,"REML")
>>>> [1] FALSE
>>>> attr(,"class")
>>>> [1] "logLik"
>>>>
>>>> #Get the associated p-value
>>>> dchisq(x2,14)
>>>>         ML
>>>>>
>>>>> 5.94849e-15
>>>>
>>>> Which looks like an improvement in model fit to me.  Am I
seeing  this
>>>> correctly or are the two models even able to be compared? they 
are both
>>>> estimated via maximum likelihood, so they should be, I think.
>>>> Any help would be appreciated.
>>>>
>>>> Corey
>>>>
>>>> Corey S. Sparks, Ph.D.
>>>>
>>>> Assistant Professor
>>>> Department of Demography and Organization Studies
>>>> University of Texas San Antonio
>>>> One UTSA Circle
>>>> San Antonio, TX 78249
>>>> email:corey.sparks at utsa.edu
>>>> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>>>>
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>>
>>>
>>>
>>>
>>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Rune Haubo Bojesen Christensen
>>
>> Master Student, M.Sc. Eng.
>> Phone: (+45) 30 26 45 54
>> Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com
>>
>> DTU Informatics, Section for Statistics
>> Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby,
Denmark
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
G?ran Brostr?m

Reasonably Related Threads

Search for more possibly parallel threads

R help - Jul 2008 - Likelihood ratio test between glm and glmer fits

[R] Likelihood ratio test between glm and glmer fits

[R] Likelihood ratio test between glm and glmer fits

[R] Likelihood ratio test between glm and glmer fits

[R] Likelihood ratio test between glm and glmer fits

[R] Likelihood ratio test between glm and glmer fits

Reasonably Related Threads