jerome@hivnet.ubc.ca
2003-Aug-07 00:41 UTC
[Rd] model.frame() call from inside a function (PR#3671)
R version: 1.7.1 OS: Red Hat Linux 7.2 Hi all, The formula object in model.frame() is not retrieved properly when model.frame() is called from within a function and the "subset" argument is supplied. foo <- function(formula,data,subset=NULL) { cat("\n*****Does formula[-3] == ~y ?**** TRUE *****\n") print(formula[-3] == ~y) cat("\n*****Result of model.frame() using formula[-3]**** FAIL *****\n") print(try(model.frame(formula[-3],data=data,subset=subset))) cat("\n*****Result of model.frame() using ~y**** WORKS *****\n") print(try(model.frame(~y,data=data,subset=subset))) } dat <- data.frame(y=c(5,25)) foo(y~1,dat) Curiously, if the "subset" argument is removed from the call to model.frame(), then the execution is successful in both cases. In ?model.frame, one can read: Variables in the formula, `subset' and in `...' are looked for first in `data' and then in the environment of `formula': see the help for `formula()' for further details. However, replacing the line subset <- eval(substitute(subset), data, env) by subset <- eval(substitute(subset), data, environment()) in model.frame.default() fixes this problem. I don't know if this correction would create more problems in other cases. Perhaps there is a better fix. Sincerely, Jerome Asselin -- Jerome Asselin (Jérôme), Statistical Analyst British Columbia Centre for Excellence in HIV/AIDS St. Paul's Hospital, 608 - 1081 Burrard Street Vancouver, British Columbia, CANADA V6Z 1Y6 Email: jerome@hivnet.ubc.ca Phone: 604 806-9112 Fax: 604 806-9044
Peter Dalgaard BSA
2003-Aug-07 10:07 UTC
[Rd] model.frame() call from inside a function (PR#3671)
jerome@hivnet.ubc.ca writes:> R version: 1.7.1 > OS: Red Hat Linux 7.2 > > Hi all, > > The formula object in model.frame() is not retrieved properly when > model.frame() is called from within a function and the "subset" argument > is supplied. > > foo <- function(formula,data,subset=NULL) > { > cat("\n*****Does formula[-3] == ~y ?**** TRUE *****\n") > print(formula[-3] == ~y) > > cat("\n*****Result of model.frame() using formula[-3]**** FAIL *****\n") > print(try(model.frame(formula[-3],data=data,subset=subset))) > > cat("\n*****Result of model.frame() using ~y**** WORKS *****\n") > print(try(model.frame(~y,data=data,subset=subset))) > } > dat <- data.frame(y=c(5,25)) > foo(y~1,dat) > > Curiously, if the "subset" argument is removed from the call to > model.frame(), then the execution is successful in both cases. > > In ?model.frame, one can read: > Variables in the formula, `subset' and in `...' are looked for > first in `data' and then in the environment of `formula': see the > help for `formula()' for further details. > > However, replacing the line > subset <- eval(substitute(subset), data, env) > by > subset <- eval(substitute(subset), data, environment()) > in model.frame.default() fixes this problem. I don't know if this > correction would create more problems in other cases. Perhaps there is a > better fix.There is really nothing to fix, at least if you go by the rule that it is only a bug if it behaves contrary to documentation: There is no "subset" in the environment of "formula", nor in the "data". If you put one there, the error goes away> subset<-NULL > foo(y~1,dat,subset=1)*****Does formula[-3] == ~y ?**** TRUE ***** [1] TRUE *****Result of model.frame() using formula[-3]**** FAIL ***** y 1 5 2 25 *****Result of model.frame() using ~y**** WORKS ***** y 1 5 However, notice that it is not the same subset. There's a whole area of similar nastiness grouped under the heading of "nonstandard evaluation rules". The basic issue is that you will often assume that the variables used for subsetting comes from the same place as those in the model, e.g. in lm(fat~age,subset=sex=="male"). The problem is that it gets really awkward when a function wants to compute the subset variable and combine it with a formula passed as an argument. And it only gets worse when arguments can be both scalar and vector, e.g. plot(fat~age, col=as.numeric(sex)) function(mycolor="green") plot(fat~age, col=mycolor) We have discussed changing this on several occasions, e.g. by requiring that arguments that need to be evaluated in the formula environment or the data frame should be either model formulas themselves or quoted expressions. However, that would break S-PLUS compatibility and also a large body of existing analysis code. [[ I did discover yesterday (or maybe I was just reminded...) that we even have nonstandard nonstandard evaluation rules in some places (nls() seems to evaluate its model formula in the global environment even if it is given explicitly within a function: f <- function() { g <- function(a,x) exp(-a*x) nls(y~g(a,x),start=list(a=.1)) } x <- 1:10 y <- exp(-.12*x)+rnorm(10,sd=.001) f() Error in eval(expr, envir, enclos) : couldn't find function "g" Argh...]] -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Saikat DebRoy
2003-Aug-07 16:28 UTC
[Rd] model.frame() call from inside a function (PR#3671)
That attachment disappeared from the previous mail. Here is the patch I mentioned. diff -Naur src/library/nls/R/nls.R /tmp/R/nls.R --- src/library/nls/R/nls.R Mon Jul 28 11:11:06 2003 +++ /tmp/R/nls.R Thu Aug 7 10:04:49 2003 @@ -38,7 +38,7 @@ nlsModel.plinear <- function( form, data, start ) { thisEnv <- environment() - env <- new.env() + env <- new.env(parent=environment(form)) for( i in names( data ) ) { assign( i, data[[i]], envir = env ) } @@ -216,7 +216,7 @@ }, predict = function(newdata = list(), qr = FALSE) { - Env <- new.env() + Env <- new.env(parent=environment(form)) for (i in objects(envir = env)) { assign(i, get(i, envir = env), envir = Env) } @@ -234,7 +234,7 @@ nlsModel <- function( form, data, start ) { thisEnv <- environment() - env <- new.env() + env <- new.env(parent=environment(form)) for( i in names( data ) ) { assign( i, data[[i]], envir = env ) } @@ -374,7 +374,7 @@ Rmat = function() qr.R( QR ), predict = function(newdata = list(), qr = FALSE) { - Env <- new.env() + Env <- new.env(parent=environment(form)) for (i in objects(envir = env)) { assign(i, get(i, envir = env), envir = Env) } @@ -398,7 +398,9 @@ subset, weights, na.action) { mf <- match.call() # for creating the model frame + env <- environment(formula) formula <- as.formula( formula ) + varNames <- all.vars(formula) # parameter and variable names from formula ## adjust a one-sided model formula by using 0 as the response @@ -435,7 +437,7 @@ mf$formula <- # replace RHS by linear model formula parse( text = paste("~", paste( varNames[varIndex], collapse = "+")))[[1]] - + environment(mf$formula) <- env mf$start <- mf$control <- mf$algorithm <- mf$trace <- NULL mf[[1]] <- as.name("model.frame") mf <- as.list(eval(mf, parent.frame()))