Prof Brian Ripley
2003-Nov-20 09:33 UTC
[Rd] Avoding scoping problems with model fit objects
A week or so ago we had a query as to why an example not unlike foo <- c(1,1,0,0,1,1) rep <- 1:6 m <- multinom(foo ~ rep) summary(m) failed. There was little special about multinom here, as m <- lm(foo ~ rep, model=FALSE) model.matrix(m) also failed. In tracking this down a couple of lessons have emerged. There is a useful paper on `non-standard evaluation' by Thomas Lumley on http://developer.r-project.org, but we need to dig a bit deeper. Since ca 1.2.0 or so the environment of the formula of a model fit has been one of the places used to look for the data used in that model fit. Unfortunately, it seems to have been assumed that object$call$formula would give the environment: it does give the formula but not the environment, whereas object$terms does usually give the environment (and in some cases object$formula does too). (Note that there is a danger lurking here: if there is no environment set, environment(foo) will give NULL, and that is the base package/namespace.) The final port of call to recreate the data is the parent env. In this case model.frame() is called from model.matrix.default. So the search for `rep' starts in model.matrix.default, and as that is in the base namespace, it looks in the namespace before the user's workspace. What one really wants to do is to look in the environment of the original model fit. We could keep a reference to that, but - its contents might have changed and - it would get saved with the object, probably bloating the saved session. There is a better way, to save the model frame on the model object, which is why the example above has non-default args. So: Lesson 1 Supply a model= argument in your model-fititng functions and consider having model=TRUE as the default. (I have added this in a few places in R-devel and my own packages, including to multinom.) Also ensure that all the useful information is in the model frame, not just variables needed in the formula but e.g. subset and weights. Lesson 2 If you have a model.frame method in your package(s), please review it in the light of the version of model.frame.lm in R-devel. You need to ensure that - a saved model frame is used if appropriate, - the original environment(formula) is found correctly, - that arguments such as data and subset are not ignored. I have added code to model.frame.default which may make most of the simpler model.frame methods unnecessary. -- Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595