Jessi Brown
2007-Feb-10 19:45 UTC
[R] error using user-defined link function with mixed models (LMER)
Greetings, everyone. I've been trying to analyze bird nest survival
data using generalized linear mixed models (because we documented
several consecutive nesting attempts by the same individuals; i.e.
repeated measures data) and have been unable to persuade the various
GLMM models to work with my user-defined link function. Actually,
glmmPQL seems to work, but as I want to evaluate a suite of competing
models, I'd like to use ML or REML estimation methods in order to end
up with meaningful log-likelihoods.
Here's the link function I use:
logexp <- function(days = 1)
{
linkfun <- function(mu) qlogis(mu^(1/days))
linkinv <- function(eta) plogis(eta)^days
mu.eta <- function(eta)
days*.Call("logit_mu_eta", eta,
PACKAGE = "stats")*plogis(eta)^(days-1)
valideta <- function(eta) TRUE
link <- paste("logexp(", days, ")", sep="")
structure(list(linkfun = linkfun, linkinv = linkinv,
mu.eta = mu.eta, valideta = valideta, name = link),
class = "link-glm")
}
# Modified binomial family function (that allows logexp link function)
logexposure<-function (link="logexp",ExposureDays) {
variance <- function(mu) mu * (1 - mu)
validmu <- function(mu) all(mu > 0) && all(mu < 1)
dev.resids <- function(y, mu, wt) .Call("binomial_dev_resids",
y, mu, wt, PACKAGE = "stats")
aic <- function(y, n, mu, wt, dev) {
m <- if (any(n > 1))
n
else wt
-2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m *
y), round(m), mu, log = TRUE))
}
initialize <- expression({
if (NCOL(y) == 1) {
if (is.factor(y)) y <- y != levels(y)[1]
n <- rep.int(1, nobs)
if (any(y < 0 | y > 1)) stop("y values must be 0 <= y
<= 1")
mustart <- (weights * y + 0.5)/(weights + 1)
m <- weights * y
if (any(abs(m - round(m)) > 0.001))
warning("non-integer successes in a binomial glm!")
} else if (NCOL(y) == 2) {
if (any(abs(y - round(y)) > 0.001))
warning("non-integer counts in a binomial glm!")
n <- y[, 1] + y[, 2]
y <- ifelse(n == 0, 0, y[, 1]/n)
weights <- weights * n
mustart <- (n * y + 0.5)/(n + 1)
} else stop("for the binomial family,",
" y must be a vector of 0 and 1's\n",
"or a 2 column matrix where col 1 is",
" no. successes and col 2 is no. failures")
})
structure(list(family="binomial", link=logexp(ExposureDays),
linkfun=logexp(ExposureDays)$linkfun,
linkinv=logexp(ExposureDays)$linkinv, variance=variance,
dev.resids=dev.resids, aic=aic,
mu.eta=logexp(ExposureDays)$mu.eta, initialize=initialize,
validmu=validmu, valideta=logexp$valideta), class = "family")
}
Now, here's how it works in a GLM:
> apfa.glm.1<-glm(Success~MeanAge+I(MeanAge^2),
family=logexposure(link="logexp", ExposureDays=apfa4$Days),
data=apfa4)
> summary(apfa.glm.1)
Call:
glm(formula = Success ~ MeanAge + I(MeanAge^2), family logexposure(link =
"logexp",
ExposureDays = apfa4$Days), data = apfa4)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1525 0.2802 0.3637 0.4291 0.7599
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.5594830 0.6085542 9.136 <2e-16 ***
MeanAge -0.0908251 0.0407218 -2.230 0.0257 *
I(MeanAge^2) 0.0014926 0.0006104 2.445 0.0145 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 323.58 on 661 degrees of freedom
Residual deviance: 285.65 on 659 degrees of freedom
AIC: 291.65
Number of Fisher Scoring iterations: 6
Next, here's the results of a glmmPQL run:
> apfa.glmm.1<-glmmPQL(Success~MeanAge+I(MeanAge^2), random=~1|Territory,
family=logexposure(link="logexp", ExposureDays=apfa4$Days),
data=apfa4)
iteration 1> summary(apfa.glmm.1)
Linear mixed-effects model fit by maximum likelihood
Data: apfa4
AIC BIC logLik
NA NA NA
Random effects:
Formula: ~1 | Territory
(Intercept) Residual
StdDev: 0.0003431913 1.051947
Variance function:
Structure: fixed weights
Formula: ~invwt
Fixed effects: Success ~ MeanAge + I(MeanAge^2)
Value Std.Error DF t-value p-value
(Intercept) 5.559466 0.6416221 624 8.664705 0.0000
MeanAge -0.090824 0.0429346 624 -2.115397 0.0348
I(MeanAge^2) 0.001493 0.0006436 624 2.319090 0.0207
Correlation:
(Intr) MeanAg
MeanAge -0.927
I(MeanAge^2) 0.826 -0.968
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-11.3646020 0.1901969 0.2485473 0.2951632 0.5499915
Number of Observations: 662
Number of Groups: 36
Finally, here's what happens when I try to run an LMER model (same
error messages no matter which estimation method I choose):
> apfa.lmer.1<-lmer(Success~MeanAge+I(MeanAge^2)+(1|Territory),
data=apfa4, family=logexposure(link="logexp",
ExposureDays=apfa4$Days), method="Laplace")
> summary(apfa.lmer.1)
Error in if (any(sd < 0)) return("'sd' slot has negative
entries") :
missing value where TRUE/FALSE needed> names(apfa.lmer.1)
NULL
So, does anyone have any idea as to whether the problem is in the
user-defined link function as written, or have any thoughts about how
to get around this problem? If LMER and LME can't do it, could there
be some way to trick the glmmML function into accepting the
user-defined link function?
Thank you in advance for any help or advice.
cheers, Jessi Brown
Douglas Bates
2007-Feb-10 20:40 UTC
[R] error using user-defined link function with mixed models (LMER)
On 2/10/07, Jessi Brown <jessilbrown at gmail.com> wrote:> Greetings, everyone. I've been trying to analyze bird nest survival > data using generalized linear mixed models (because we documented > several consecutive nesting attempts by the same individuals; i.e. > repeated measures data) and have been unable to persuade the various > GLMM models to work with my user-defined link function. Actually, > glmmPQL seems to work, but as I want to evaluate a suite of competing > models, I'd like to use ML or REML estimation methods in order to end > up with meaningful log-likelihoods. > > Here's the link function I use: > logexp <- function(days = 1) > { > linkfun <- function(mu) qlogis(mu^(1/days)) > linkinv <- function(eta) plogis(eta)^days > mu.eta <- function(eta) > days*.Call("logit_mu_eta", eta, > PACKAGE = "stats")*plogis(eta)^(days-1) > valideta <- function(eta) TRUE > link <- paste("logexp(", days, ")", sep="") > structure(list(linkfun = linkfun, linkinv = linkinv, > mu.eta = mu.eta, valideta = valideta, name = link), > class = "link-glm") > } > > # Modified binomial family function (that allows logexp link function) > logexposure<-function (link="logexp",ExposureDays) { > variance <- function(mu) mu * (1 - mu) > validmu <- function(mu) all(mu > 0) && all(mu < 1) > dev.resids <- function(y, mu, wt) .Call("binomial_dev_resids", > y, mu, wt, PACKAGE = "stats") > aic <- function(y, n, mu, wt, dev) { > m <- if (any(n > 1)) > n > else wt > -2 * sum(ifelse(m > 0, (wt/m), 0) * dbinom(round(m * > y), round(m), mu, log = TRUE)) > } > initialize <- expression({ > if (NCOL(y) == 1) { > if (is.factor(y)) y <- y != levels(y)[1] > n <- rep.int(1, nobs) > if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1") > mustart <- (weights * y + 0.5)/(weights + 1) > m <- weights * y > if (any(abs(m - round(m)) > 0.001)) > warning("non-integer successes in a binomial glm!") > } else if (NCOL(y) == 2) { > if (any(abs(y - round(y)) > 0.001)) > warning("non-integer counts in a binomial glm!") > n <- y[, 1] + y[, 2] > y <- ifelse(n == 0, 0, y[, 1]/n) > weights <- weights * n > mustart <- (n * y + 0.5)/(n + 1) > } else stop("for the binomial family,", > " y must be a vector of 0 and 1's\n", > "or a 2 column matrix where col 1 is", > " no. successes and col 2 is no. failures") > }) > structure(list(family="binomial", link=logexp(ExposureDays), > linkfun=logexp(ExposureDays)$linkfun, > linkinv=logexp(ExposureDays)$linkinv, variance=variance, > dev.resids=dev.resids, aic=aic, > mu.eta=logexp(ExposureDays)$mu.eta, initialize=initialize, > validmu=validmu, valideta=logexp$valideta), class = "family") > } > > > > Now, here's how it works in a GLM: > > > apfa.glm.1<-glm(Success~MeanAge+I(MeanAge^2), family=logexposure(link="logexp", ExposureDays=apfa4$Days), data=apfa4) > > summary(apfa.glm.1) > > Call: > glm(formula = Success ~ MeanAge + I(MeanAge^2), family > logexposure(link = "logexp", > ExposureDays = apfa4$Days), data = apfa4) > > Deviance Residuals: > Min 1Q Median 3Q Max > -3.1525 0.2802 0.3637 0.4291 0.7599 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 5.5594830 0.6085542 9.136 <2e-16 *** > MeanAge -0.0908251 0.0407218 -2.230 0.0257 * > I(MeanAge^2) 0.0014926 0.0006104 2.445 0.0145 * > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 323.58 on 661 degrees of freedom > Residual deviance: 285.65 on 659 degrees of freedom > AIC: 291.65 > > Number of Fisher Scoring iterations: 6 > > > > Next, here's the results of a glmmPQL run: > > > apfa.glmm.1<-glmmPQL(Success~MeanAge+I(MeanAge^2), random=~1|Territory, family=logexposure(link="logexp", ExposureDays=apfa4$Days), data=apfa4) > iteration 1 > > summary(apfa.glmm.1) > Linear mixed-effects model fit by maximum likelihood > Data: apfa4 > AIC BIC logLik > NA NA NA > > Random effects: > Formula: ~1 | Territory > (Intercept) Residual > StdDev: 0.0003431913 1.051947 > > Variance function: > Structure: fixed weights > Formula: ~invwt > Fixed effects: Success ~ MeanAge + I(MeanAge^2) > Value Std.Error DF t-value p-value > (Intercept) 5.559466 0.6416221 624 8.664705 0.0000 > MeanAge -0.090824 0.0429346 624 -2.115397 0.0348 > I(MeanAge^2) 0.001493 0.0006436 624 2.319090 0.0207 > Correlation: > (Intr) MeanAg > MeanAge -0.927 > I(MeanAge^2) 0.826 -0.968 > > Standardized Within-Group Residuals: > Min Q1 Med Q3 Max > -11.3646020 0.1901969 0.2485473 0.2951632 0.5499915 > > Number of Observations: 662 > Number of Groups: 36 > > > > Finally, here's what happens when I try to run an LMER model (same > error messages no matter which estimation method I choose): > > > apfa.lmer.1<-lmer(Success~MeanAge+I(MeanAge^2)+(1|Territory), data=apfa4, family=logexposure(link="logexp", ExposureDays=apfa4$Days), method="Laplace") > > summary(apfa.lmer.1) > Error in if (any(sd < 0)) return("'sd' slot has negative entries") : > missing value where TRUE/FALSE needed > > names(apfa.lmer.1) > NULL> So, does anyone have any idea as to whether the problem is in the > user-defined link function as written, or have any thoughts about how > to get around this problem? If LMER and LME can't do it, could there > be some way to trick the glmmML function into accepting the > user-defined link function?lmer is designed to work with arbitrary families but I haven't done a lot of testing outside the binomial and poisson families. Try looking at the structure of an instance of your family and comparing it to, say, str(binomial()) Make sure that all the components are there and have the correct form. The next step is to use verbose output so you can see the progress of the iterations. Add the optional argument control = list(msVerbose 1) to your call to lmer. Instead of immediately requesting a summary, use str(apfa.lmer.1) to check the structure. Again you may want to compare this description to that from str applied to a similar fit using the binomial family.
Ken Knoblauch
2007-Feb-11 15:15 UTC
[R] error using user-defined link function with mixed models (LMER)
Isn't it the case, that since R 2.40 that all one ought to need do is define one's own link and pass it to the family, rather re-defining the whole family? CHANGES IN R VERSION 2.4.0 ... o make.link() now returns an object of class "link-glm". The GLM families accept an object of this class for their 'link' argument, which allows user-specified link functions. Also, quasi() allows user-specified variance functions. I thought that was the point of the example on the family help page. ken Douglas Bates a ?crit : Look at the 'link' component of the two lists. In the binomial family object the link component is a character vector of link 1. In your logexposure family object it is a list of length 5. On 2/10/07, Jessi Brown <jessilbrown at gmail.com> wrote:> Ok, I've tried checking out the structure of the binomial and > logexposure families, the big difference appears to be the valideta > parameter (it's "NULL" in the logexposure family). >-- Ken Knoblauch Inserm U846 Institut Cellule Souche et Cerveau D?partement Neurosciences Int?gratives 18 avenue du Doyen L?pine 69500 Bron France tel: +33 (0)4 72 91 34 77 fax: +33 (0)4 72 91 34 61 portable: +33 (0)6 84 10 64 10 http://www.lyon.inserm.fr/371/