Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart. In short, the issue has to do with how rpart evaluates a formula and supporting arguments, in particular 'weights'. A simple contrived example is ----------------------------------------------------------------------------- library(rpart) ## using data from help(rpart), set up simple example myformula <- formula(Kyphosis ~ Age + Number + Start) mydata <- kyphosis myweight <- abs(rnorm(nrow(mydata))) goodFunction <- function(mydata, myformula, myweight) { hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp } goodFunction(mydata, myformula, myweight) cat("Ok\n") ## now remove myweight and try to compute it inside a function rm(myweight) badFunction <- function(mydata, myformula) { myweight <- abs(rnorm(nrow(mydata))) mf <- model.frame(myformula, mydata, myweight) print(head(df)) hyp <- rpart(myformula, data=mf, weights=myweight, method="class") prev <- hyp } badFunction(mydata, myformula) cat("Done\n") ----------------------------------------------------------------------------- Here goodFunction works, but only because myweight (with useless random weights, but that is not the point here) is found from the calling environment. badFunction fails after we remove myweight from there: :~> cat /tmp/philipp.R | R --slave Ok Error in eval(expr, envir, enclos) : object "myweight" not found Execution halted :~> As I was able to replicate it, I reported this to the package maintainer. It turns out that seemingly all is well as this is supposed to work this way, and I got a friendly pointer to study model.frame and its help page. Now I am stuck as I can't make sense of model.frame -- see badFunction above. I would greatly appreciate any help in making rpart work with a local argument weights so that I can tell Philipp that there is no bug. :) Regards, Dirk -- Hell, there are no rules here - we're trying to accomplish something. -- Thomas A. Edison
On 6/15/07, Dirk Eddelbuettel <edd at debian.org> wrote:> > Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart. > In short, the issue has to do with how rpart evaluates a formula and > supporting arguments, in particular 'weights'. > > A simple contrived example is > > ----------------------------------------------------------------------------- > library(rpart) > > ## using data from help(rpart), set up simple example > myformula <- formula(Kyphosis ~ Age + Number + Start) > mydata <- kyphosis > myweight <- abs(rnorm(nrow(mydata))) > > goodFunction <- function(mydata, myformula, myweight) { > hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") > prev <- hyp > } > goodFunction(mydata, myformula, myweight) > cat("Ok\n") > > ## now remove myweight and try to compute it inside a function > rm(myweight) > > badFunction <- function(mydata, myformula) { > myweight <- abs(rnorm(nrow(mydata))) > mf <- model.frame(myformula, mydata, myweight) > print(head(df)) > hyp <- rpart(myformula, > data=mf, > weights=myweight, > method="class") > prev <- hyp > } > badFunction(mydata, myformula) > cat("Done\n") > ----------------------------------------------------------------------------- > > Here goodFunction works, but only because myweight (with useless random > weights, but that is not the point here) is found from the calling > environment. > > badFunction fails after we remove myweight from there: > > :~> cat /tmp/philipp.R | R --slave > Ok > Error in eval(expr, envir, enclos) : object "myweight" not found > Execution halted > :~> > > As I was able to replicate it, I reported this to the package maintainer. It > turns out that seemingly all is well as this is supposed to work this way, > and I got a friendly pointer to study model.frame and its help page. > > Now I am stuck as I can't make sense of model.frame -- see badFunction > above. I would greatly appreciate any help in making rpart work with a local > argument weights so that I can tell Philipp that there is no bug. :)I don't know if ?model.frame is the best place page to look. There's a more detailed description at http://developer.r-project.org/nonstandard-eval.pdf but here are the non-standard evaluation rules as I understand them: given a name in either (1) the formula or (2) ``special'' arguments like 'weights' in this case, or 'subset', try to find the name 1. in 'data' 2. failing that, in environment(formula) 3. failing that, in the enclosing environment, and so on. By 'name', I mean a symbol, such as 'Age' or 'myweight'. So basically, everything is as you would expect if the name is visible in data, but if not, the search starts in the environment of the formula, not the environment where the function call is being made (which is the standard evaulation behaviour). This is a feature, not a bug (things would be a lot more confusing if it were the other way round). With this in mind, either of the following might do what you want: badFunction <- function(mydata, myformula) { mydata$myweight <- abs(rnorm(nrow(mydata))) hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp } badFunction <- function(mydata, myformula) { myweight <- abs(rnorm(nrow(mydata))) environment(myformula) <- environment() hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp } -Deepayan
On Fri, 2007-06-15 at 10:47 -0500, Dirk Eddelbuettel wrote:> Philipp Benner reported a Debian bug report against r-cran-rpart aka rpart. > In short, the issue has to do with how rpart evaluates a formula and > supporting arguments, in particular 'weights'. > > A simple contrived example is > > ----------------------------------------------------------------------------- > library(rpart) > > ## using data from help(rpart), set up simple example > myformula <- formula(Kyphosis ~ Age + Number + Start) > mydata <- kyphosis > myweight <- abs(rnorm(nrow(mydata))) > > goodFunction <- function(mydata, myformula, myweight) { > hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") > prev <- hyp > } > goodFunction(mydata, myformula, myweight) > cat("Ok\n") > > ## now remove myweight and try to compute it inside a function > rm(myweight) > > badFunction <- function(mydata, myformula) { > myweight <- abs(rnorm(nrow(mydata))) > mf <- model.frame(myformula, mydata, myweight) > print(head(df)) > hyp <- rpart(myformula, > data=mf, > weights=myweight, > method="class") > prev <- hyp > } > badFunction(mydata, myformula) > cat("Done\n") > ----------------------------------------------------------------------------- > > Here goodFunction works, but only because myweight (with useless random > weights, but that is not the point here) is found from the calling > environment. > > badFunction fails after we remove myweight from there: > > :~> cat /tmp/philipp.R | R --slave > Ok > Error in eval(expr, envir, enclos) : object "myweight" not found > Execution halted > :~> > > As I was able to replicate it, I reported this to the package maintainer. It > turns out that seemingly all is well as this is supposed to work this way, > and I got a friendly pointer to study model.frame and its help page. > > Now I am stuck as I can't make sense of model.frame -- see badFunction > above. I would greatly appreciate any help in making rpart work with a local > argument weights so that I can tell Philipp that there is no bug. :) > > Regards, DirkDirk, As you note, the issue is the non-standard evaluation of the arguments in model.frame() The key section of the Details in ?model.frame is: All the variables in formula, subset and in ... are looked for first in data and then in the environment of formula (see the help for formula() for further details) and collected into a data frame. Then the subset expression is evaluated, and it is is used as a row index to the data frame. Then the na.action function is applied to the data frame (and may well add attributes). The levels of any factors in the data frame are adjusted according to the drop.unused.levels and xlev arguments. Note that even with your goodFunction(), if 'myweight' is created within the environment of the function and not in the global environment, it still fails: library(rpart) myformula <- formula(Kyphosis ~ Age + Number + Start) mydata <- kyphosis goodFunction <- function(mydata, myformula) { myweight <- abs(rnorm(nrow(mydata))) hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp }> goodFunction(mydata, myformula)Error in eval(expr, envir, enclos) : object "myweight" not found However, now let's do this: library(rpart) myformula <- formula(Kyphosis ~ Age + Number + Start) mydata <- kyphosis myweight <- abs(rnorm(nrow(mydata))) goodFunction <- function(mydata, myformula) { hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp }> goodFunction(mydata, myformula) >It works, because 'myweight' is found in the global environment, which is where the formula is created. Now, final example, try this: library(rpart) goodFunction <- function() { myformula <- formula(Kyphosis ~ Age + Number + Start) mydata <- kyphosis myweight <- abs(rnorm(nrow(mydata))) hyp <- rpart(myformula, data=mydata, weights=myweight, method="class") prev <- hyp }> goodFunction() >It works because the formula is created within the environment of the function and hence, 'myweight', which is created there as well, is found. There was a (non) bug filed on a related matter dealing with the evaluation of 'subset': http://bugs.r-project.org/cgi-bin/R/feature%26FAQ?id=3671 and you might find this document on Non-Standard Evaluation helpful: http://developer.r-project.org/nonstandard-eval.pdf HTH, Marc