-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 WARNING: this is long. Sorry I couldn't find a way to compress it. Is there a reasonable way to design an update method so that it's robust to a variety of reasonable use cases of generating calls or data inside or outside a function? Is it even possible? Should I just tell users "don't do that"? * `update.default()` uses `eval(call, parent.frame())`; this fails when the call depends on objects that were defined in a different environment (e.g., when the data are generated and the model initially fitted within a function scope) * an alternative is to store the original environment in which the fitting is done in the environment of the formula and use `eval(call, env=environment(formula(object)))`; this fails if the user tries to update the model originally fitted outside a function with data modified within a function ... * I think I've got a hack that works below, which first tries in the environment of the formula and falls back to the parent frame if that fails, but I wonder if I'm missing something much simpler .. Thoughts? My understanding of environments and frames is still, after all these years, not what it should be ... I've thought of some other workarounds, none entirely satisfactory: * force evaluation of all elements in the original call * printing components of the call can get ugly (can save the original call before evaluating) * large objects in the call get duplicated * don't use `eval(call)` for updates; instead try to store everything internally * this works OK but has the same drawback of potentially storing large extra copies * we could try to use the model frame (which is stored already), but there are issues with this (the basis of a whole separate rant) because the model frame stores something in between predictor variables and input variables. For example d <- data.frame(y=1:10,x=runif(10)) names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)" So if we wanted to do something like update to "y ~ sqrt(x)", it wouldn't work ... =================update.envformula <- function(object,...) { extras <- match.call(expand.dots = FALSE)$... call <- getCall(object) for (i in names(extras)) { existing <- !is.na(match(names(extras), names(call))) for (a in names(extras)[existing]) call[[a]] <- extras[[a]] if (any(!existing)) { call <- c(as.list(call), extras[!existing]) call <- as.call(call) } } eval(call, env=environment(formula(object))) ## enclos=parent.frame() doesn't help } update.both <- function(object,...) { extras <- match.call(expand.dots = FALSE)$... call <- getCall(object) for (i in names(extras)) { existing <- !is.na(match(names(extras), names(call))) for (a in names(extras)[existing]) call[[a]] <- extras[[a]] if (any(!existing)) { call <- c(as.list(call), extras[!existing]) call <- as.call(call) } } pf <- parent.frame() ## save parent frame in case we need it tryCatch(eval(call, env=environment(formula(object))), error=function(e) { eval(call, pf) }) } ### TEST CASES set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <- lm(y~x,data=d) ##' define data within function, return fitted model f1 <- function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) } ##' define (and modify) data within function, try to update ##' model fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force copy update.default(m,data=d2) } ##' define (and modify) data within function, try to update ##' model fitted elsewhere (use envformula) f3 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force copy update.envformula(m,data=d2) } ##' hack: first try the formula, then the parent frame ##' if that doesn't work for any reason f4 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force copy update.both(m,data=d2) } ## Case 1: fit within function m2 <- f1() try(update.default(m2)) ## default: object 'd2' not found m3A <- update.envformula(m2) ## envformula: works m3B <- update.both(m2) ## works ## Case 2: update within function m4A <- f2(m1) ## default: works try(f3(m1)) ## envformula: object 'd2' not found m4B <- f4(m1) ## works -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVDvHBAAoJEOCV5YRblxUHtnwH/2452Fqmsm//LxNeXxhNrs/G 3ss8rPV2EVJQFjAyO34zzuYZVp1mWlgt7C1L7nXcH4lcf66gYfj75X/yVvMxxwmS uXEP9HKr4TR5D6Ukf16qqtK41PhTkjS+RDaEpj6BVq9YxlYb53kSjKF+anrf08SL K6i2L+c4nJIwk56CCp0Re/EIDCNeW5lXYX9jdPAalMoxSHTG8wmytvq0x89+KsRC aUTpdylPka70vwBQle9W4OEyhBpADIHfEg8csYXZ/MeOKM0bqCRu2IU3OyuCsxyX Pbo3w1Z7x0e+2WoX4PcltweYK4PJC9O10v+RKYaI5YRy1dFWMQWcPoAKRKf7jwY=S7zP -----END PGP SIGNATURE-----
Dear Ben, Last week I was struggling with incorporating lme4 into a package. I traced the problem and made a reproducible example ( https://github.com/ThierryO/testlme4). It looks very simular to the problem you describe. The 'tests' directory contains the reproducible examples. confint() of a model as returned by a function fails. It even fails when I try to calculate the confint() inside the same function as the glmer() call (see the fit_model_ci function). Best regards, Thierry ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>:> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > WARNING: this is long. Sorry I couldn't find a way to compress it. > > Is there a reasonable way to design an update method so that it's > robust to a variety of reasonable use cases of generating calls or > data inside or outside a function? Is it even possible? Should I > just tell users "don't do that"? > > * `update.default()` uses `eval(call, parent.frame())`; this fails > when the call depends on objects that were defined in a different > environment (e.g., when the data are generated and the model > initially fitted within a function scope) > > * an alternative is to store the original environment in which the > fitting is done in the environment of the formula and use > `eval(call, env=environment(formula(object)))`; this fails if the > user tries to update the model originally fitted outside a function > with data modified within a function ... > > * I think I've got a hack that works below, which first tries in the > environment of the formula and falls back to the parent frame if > that fails, but I wonder if I'm missing something much simpler .. > > Thoughts? My understanding of environments and frames is still, > after all these years, not what it should be ... > > I've thought of some other workarounds, none entirely satisfactory: > > * force evaluation of all elements in the original call > * printing components of the call can get ugly (can save the > original call before evaluating) > * large objects in the call get duplicated > * don't use `eval(call)` for updates; instead try to store everything > internally > * this works OK but has the same drawback of potentially storing > large extra copies > * we could try to use the model frame (which is stored already), > but there are issues with this (the basis of a whole separate rant) > because the model frame stores something in between predictor > variables and input variables. For example > > d <- data.frame(y=1:10,x=runif(10)) > names(model.frame(lm(y~log(x),data=d))) > ## "y" "log(x)" > > So if we wanted to do something like update to "y ~ sqrt(x)", > it wouldn't work ... > > =================> update.envformula <- function(object,...) { > extras <- match.call(expand.dots = FALSE)$... > call <- getCall(object) > for (i in names(extras)) { > existing <- !is.na(match(names(extras), names(call))) > for (a in names(extras)[existing]) call[[a]] <- extras[[a]] > if (any(!existing)) { > call <- c(as.list(call), extras[!existing]) > call <- as.call(call) > } > } > eval(call, env=environment(formula(object))) > ## enclos=parent.frame() doesn't help > } > > update.both <- function(object,...) { > extras <- match.call(expand.dots = FALSE)$... > call <- getCall(object) > for (i in names(extras)) { > existing <- !is.na(match(names(extras), names(call))) > for (a in names(extras)[existing]) call[[a]] <- extras[[a]] > if (any(!existing)) { > call <- c(as.list(call), extras[!existing]) > call <- as.call(call) > } > } > pf <- parent.frame() ## save parent frame in case we need it > tryCatch(eval(call, env=environment(formula(object))), > error=function(e) { > eval(call, pf) > }) > } > > ### TEST CASES > > set.seed(101) > d <- data.frame(x=1:10,y=rnorm(10)) > m1 <- lm(y~x,data=d) > > ##' define data within function, return fitted model > f1 <- function() { > d2 <- d > lm(y~x,data=d2) > return(lm(y~x,data=d2)) > } > ##' define (and modify) data within function, try to update > ##' model fitted elsewhere > f2 <- function(m) { > d2 <- d; d2[1] <- d2[1]+0 ## force copy > update.default(m,data=d2) > } > ##' define (and modify) data within function, try to update > ##' model fitted elsewhere (use envformula) > f3 <- function(m) { > d2 <- d; d2[1] <- d2[1]+0 ## force copy > update.envformula(m,data=d2) > } > > ##' hack: first try the formula, then the parent frame > ##' if that doesn't work for any reason > f4 <- function(m) { > d2 <- d; d2[1] <- d2[1]+0 ## force copy > update.both(m,data=d2) > } > > ## Case 1: fit within function > m2 <- f1() > try(update.default(m2)) ## default: object 'd2' not found > m3A <- update.envformula(m2) ## envformula: works > m3B <- update.both(m2) ## works > > ## Case 2: update within function > m4A <- f2(m1) ## default: works > try(f3(m1)) ## envformula: object 'd2' not found > m4B <- f4(m1) ## works > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > > iQEcBAEBAgAGBQJVDvHBAAoJEOCV5YRblxUHtnwH/2452Fqmsm//LxNeXxhNrs/G > 3ss8rPV2EVJQFjAyO34zzuYZVp1mWlgt7C1L7nXcH4lcf66gYfj75X/yVvMxxwmS > uXEP9HKr4TR5D6Ukf16qqtK41PhTkjS+RDaEpj6BVq9YxlYb53kSjKF+anrf08SL > K6i2L+c4nJIwk56CCp0Re/EIDCNeW5lXYX9jdPAalMoxSHTG8wmytvq0x89+KsRC > aUTpdylPka70vwBQle9W4OEyhBpADIHfEg8csYXZ/MeOKM0bqCRu2IU3OyuCsxyX > Pbo3w1Z7x0e+2WoX4PcltweYK4PJC9O10v+RKYaI5YRy1dFWMQWcPoAKRKf7jwY> =S7zP > -----END PGP SIGNATURE----- > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 15-03-23 12:55 PM, Thierry Onkelinx wrote:> Dear Ben, > > Last week I was struggling with incorporating lme4 into a package. > I traced the problem and made a reproducible example ( > https://github.com/ThierryO/testlme4). It looks very simular to > the problem you describe. > > The 'tests' directory contains the reproducible examples. confint() > of a model as returned by a function fails. It even fails when I > try to calculate the confint() inside the same function as the > glmer() call (see the fit_model_ci function). > > Best regards, > > ThierryUgh. I can get this to work if I also try searching up the call stack, as follows (within update.merMod). This feels like "code smell" to me though -- i.e., if I have to hack this hard I must be doing something wrong/misunderstanding how the problem *should* be done. if (evaluate) { ff <- environment(formula(object)) pf <- parent.frame() ## save parent frame in case we need it sf <- sys.frames()[[1]] tryCatch(eval(call, env=ff), error=function(e) { tryCatch(eval(call, env=sf), error=function(e) { eval(call, pf) }) }) } else call Here is an adapted even-more-minimal version of your code, which seems to work with the version of update.merMod I just pushed to github, but fails for glm(): ## https://github.com/ThierryO/testlme4/blob/master/R/fit_model_ci.R fit_model_ci <- function(formula, dataset, mfun=glmer){ model <- mfun( formula = formula, data = dataset, family = "poisson" ) ci <- confint(model) return(list(model = model, confint = ci)) } library("lme4") set.seed(101) dd <- data.frame(f=factor(rep(1:10,each=100)), y=rpois(2,1000)) fit_model_ci(y~(1|f),dataset=dd) fit_model_ci(y~(1|f),dataset=dd,mfun=glm)> > > ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / > Research Institute for Nature and Forest team Biometrie & > Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat > 25 1070 Anderlecht Belgium > > To call in the statistician after the experiment is done may be no > more than asking him to perform a post-mortem examination: he may > be able to say what the experiment died of. ~ Sir Ronald Aylmer > Fisher The plural of anecdote is not data. ~ Roger Brinner The > combination of some data and an aching desire for an answer does > not ensure that a reasonable answer can be extracted from a given > body of data. ~ John Tukey > > 2015-03-22 17:45 GMT+01:00 Ben Bolker <bbolker at gmail.com>: > > WARNING: this is long. Sorry I couldn't find a way to compress > it. > > Is there a reasonable way to design an update method so that it's > robust to a variety of reasonable use cases of generating calls or > data inside or outside a function? Is it even possible? Should I > just tell users "don't do that"? > > * `update.default()` uses `eval(call, parent.frame())`; this fails > when the call depends on objects that were defined in a different > environment (e.g., when the data are generated and the model > initially fitted within a function scope) > > * an alternative is to store the original environment in which the > fitting is done in the environment of the formula and use > `eval(call, env=environment(formula(object)))`; this fails if the > user tries to update the model originally fitted outside a > function with data modified within a function ... > > * I think I've got a hack that works below, which first tries in > the environment of the formula and falls back to the parent frame > if that fails, but I wonder if I'm missing something much simpler > .. > > Thoughts? My understanding of environments and frames is still, > after all these years, not what it should be ... > > I've thought of some other workarounds, none entirely > satisfactory: > > * force evaluation of all elements in the original call * printing > components of the call can get ugly (can save the original call > before evaluating) * large objects in the call get duplicated * > don't use `eval(call)` for updates; instead try to store > everything internally * this works OK but has the same drawback of > potentially storing large extra copies * we could try to use the > model frame (which is stored already), but there are issues with > this (the basis of a whole separate rant) because the model frame > stores something in between predictor variables and input > variables. For example > > d <- data.frame(y=1:10,x=runif(10)) > names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)" > > So if we wanted to do something like update to "y ~ sqrt(x)", it > wouldn't work ... > > ================== update.envformula <- function(object,...) { > extras <- match.call(expand.dots = FALSE)$... call <- > getCall(object) for (i in names(extras)) { existing <- > !is.na(match(names(extras), names(call))) for (a in > names(extras)[existing]) call[[a]] <- extras[[a]] if > (any(!existing)) { call <- c(as.list(call), extras[!existing]) call > <- as.call(call) } } eval(call, env=environment(formula(object))) > ## enclos=parent.frame() doesn't help } > > update.both <- function(object,...) { extras <- > match.call(expand.dots = FALSE)$... call <- getCall(object) for (i > in names(extras)) { existing <- !is.na(match(names(extras), > names(call))) for (a in names(extras)[existing]) call[[a]] <- > extras[[a]] if (any(!existing)) { call <- c(as.list(call), > extras[!existing]) call <- as.call(call) } } pf <- parent.frame() > ## save parent frame in case we need it tryCatch(eval(call, > env=environment(formula(object))), error=function(e) { eval(call, > pf) }) } > > ### TEST CASES > > set.seed(101) d <- data.frame(x=1:10,y=rnorm(10)) m1 <- > lm(y~x,data=d) > > ##' define data within function, return fitted model f1 <- > function() { d2 <- d lm(y~x,data=d2) return(lm(y~x,data=d2)) } ##' > define (and modify) data within function, try to update ##' model > fitted elsewhere f2 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## > force copy update.default(m,data=d2) } ##' define (and modify) data > within function, try to update ##' model fitted elsewhere (use > envformula) f3 <- function(m) { d2 <- d; d2[1] <- d2[1]+0 ## force > copy update.envformula(m,data=d2) } > > ##' hack: first try the formula, then the parent frame ##' if that > doesn't work for any reason f4 <- function(m) { d2 <- d; d2[1] <- > d2[1]+0 ## force copy update.both(m,data=d2) } > > ## Case 1: fit within function m2 <- f1() try(update.default(m2)) > ## default: object 'd2' not found m3A <- update.envformula(m2) ## > envformula: works m3B <- update.both(m2) ## works > > ## Case 2: update within function m4A <- f2(m1) ## default: works > try(f3(m1)) ## envformula: object 'd2' not found m4B <- f4(m1) > ## works > >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJVEfl/AAoJEOCV5YRblxUHWdgH/AqLAhDqKV8aRg6jnX9rO96D nwzqv0ClMIxVr2dzD4eSQTL2caWZnXVkws+lg9N7bc4BaWplcYxLNRBw5M8zHOPJ E7JlhG3EecvmeAEt9OY0/q6I0D6vdoEjcH7wzzuyLLIqllu9OskxURi/azMs0XRo tiN+oG5aOKsMYsEGjtiWySRDzhJh2TM40A1HHjAViqpxZcqilAZ6RiNEFe1t1JY0 IvDI8yesSuHnKtgAiqk9ivGw4BCCGoBSIHB3GrJIi11j06iYKw0ugVHIlKYO8cqf AYTvEX2sSxsJgKWYTiG/1dr/kiFTntTDji03zRLVUdPKIZATJMczv+KB+0bpoVY=Z34K -----END PGP SIGNATURE-----