Bolker,Benjamin Michael
2009-Oct-21  03:57 UTC
[Rd] odd evaluation within correlation argument of glmmPQL
[I think I've seen this reported before but can't locate it any more.
I believe this oddity (glitch? feature?) is behind a query that
Jean-Baptiste Ferdy asked a year ago 
<http://finzi.psych.upenn.edu/Rhelp08/2008-October/176449.html>]
  It appears that glmmPQL looks in the global workspace, not
within the data frame specified by the "data" argument, for
the variables specified in the "form" argument of spatial
correlation structures provided to the "correlation" argument.
This is potentially confusing/dangerous, because you can easily
(1) get an error saying that the variable is not found (the least
dangerous and confusing), (2) get an error saying the variable
is of the wrong length, if there is a variable with a matching
name but the wrong length in the global environment,
(3) get completely misled if there is a *different* variable with
a matching name in the global environment with the right
length (in which case glmmPQL will ignore the element in the data
frame you specified and use the one from the global environment).
  The following patch (to MASS 7.3-3, retrieved from CRAN src/contrib
2.10.0 directory) appears to fix the problem.  A quick look suggests that
the same version of MASS is in 2.11.0 as well.
*** glmmPQL.R.orig	2009-10-20 23:48:38.000000000 -0400
--- glmmPQL.R	2009-10-20 23:51:33.000000000 -0400
***************
*** 39,44 ****
--- 39,47 ----
      off <- attr(Terms, "offset")
      if(length(off<- attr(Terms, "offset"))) allvars <-
          c(allvars, as.character(attr(Terms, "variables"))[off+1])
+     ## add variables in correlation formula, if any
+     if (!missing(correlation) &&
!is.null(attr(correlation,"formula")))
+       allvars <- c(allvars,all.vars(attr(correlation,"formula")))
      ## substitute back actual formula (rather than a variable name)
      Call$fixed <- eval(fixed); Call$random <- eval(random)
      m$formula <- as.formula(paste("~", paste(allvars,
collapse="+")))
 code below shows the problem.
    Ben Bolker
## set up data
set.seed(1001)
d = data.frame(x = runif(100),
  y = runif(100),
  z = rpois(100,2),
  f = factor(rep(letters[1:20],each=5)))
library(MASS)
## without correlation structure: fine
glmmPQL(z~1,random=~1|f,data=d,family="poisson")
## with correlation structure,
##   but x and y not defined outside of 'd'
glmmPQL(z~1,random=~1|f,data=d,
        correlation=corExp(form=~x+y),
        family="poisson")
## Error in eval(expr, envir, enclos) : object 'x' not found
x = y = runif(10)
glmmPQL(z~1,random=~1|f,data=d,
        correlation=corExp(form=~x+y),
        family="poisson")
## Error in ...
##   variable lengths differ (found for 'f')
## this is what we probably wanted
x = d$x
y = d$y
glmmPQL(z~1,random=~1|f,data=d,
        correlation=corExp(form=~x+y),
        family="poisson")
## this is dangerous!
x = y = runif(100)
glmmPQL(z~1,random=~1|f,data=d,
        correlation=corExp(form=~x+y),
        family="poisson")
## proof that glmmPQL really is looking
## outside of the data frame for its
## correlation variables ...
x = y = rep(0,100)
glmmPQL(z~1,random=~1|f,data=d,
        correlation=corExp(form=~x+y),
        family="poisson")
## Error in getCovariate.corSpatial(object, data = data) : 
##   Cannot have zero distances in "corSpatial"
> sessionInfo()
R version 2.9.2 (2009-08-24) 
i486-pc-linux-gnu 
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats     graphics  grDevices utils     datasets  grid      methods  
[8] base     
other attached packages:
[1] nlme_3.1-95   MASS_7.2-49   reshape_0.8.3 plyr_0.1.9    proto_0.3-8  
loaded via a namespace (and not attached):
[1] ggplot2_0.8.4   lattice_0.17-26
Ben Bolker
2009-Oct-27  19:05 UTC
[Rd] odd evaluation within correlation argument of glmmPQL
Ben Bolker wrote:> > > [snip] > > It appears that glmmPQL looks in the global workspace, not > within the data frame specified by the "data" argument, for > the variables specified in the "form" argument of spatial > correlation structures provided to the "correlation" argument. > > [snip example and patches] > >Replying to my own e-mail to bump this (it's been a week). The basic issue is that glmmPQL throws out data that are not involved in the fixed or random model terms, or in the offset term. It doesn't save variables that are only found in the correlation formula argument. It's hard to imagine how this behavior is anything other than a bug ... and the patch fixes it in a straightforward way. cheers Ben Bolker -- View this message in context: http://www.nabble.com/odd-evaluation-within-correlation-argument-of-glmmPQL-tp25986187p26083255.html Sent from the R devel mailing list archive at Nabble.com.