This issue has arisen within my anova.coxph routine, but is as easily illustrated with glm. testdata <- data.frame(y= 1:5, n= c(8,10,6,20,14), sex = c(0,1,0,1,1), age = c(30,20,35,25,40)) fit <- glm(cbind(y,n) ~ age + sex, binomial, data=testdata, model=TRUE) saveit <- fit$model update(fit, .~. - age, data=saveit) Error in cbind(y, n) : object 'y' not found One would hope that a saved model frame is precisely the thing that would work best. The issue of course is that "cbind(y, n)" is the name of the first variable in saveit, and it is not being properly quoted somewhere down the line. The same issue can occur on the right hand side. "Save the model frame in case you need to refit something next month" is does not appear to be a safe approach to reproducable research. fit2 <- glm(y ~ sex + log(age), poisson, testdata) save2 <- fit2$model update(fit2, . ~ . - sex, data=save2) # fails glm(y ~ log(age), poisson, save2) # fails I can work around this in my anova, but I wanted to not rebuild the frame if it is already present. It looks like model.matrix plus attr(x, 'assign') time -- a bit harder to read, but that looks like what anova.glm is doing. Is there a way to make update work? The current code, BTW, starts by building its own frame using results of terms.inner, which solves the above issue nicely and update() works as expected. But it isn't robust to scoping issues. (As pointed out yesterday by a user: lapply of a function that contained coxph followed by anova gives a variable not found error.) It also ignores saved model frames; thus the rewrite. Terry T
> "Save the model frame in case you need to refit something next month" > does not appear to be a safe approach to reproducible research.Is this a standard recommendation? It will not work in many cases. E.g., if you use lm() to model the sum of some variables the model.frame contains only the sum, not the addends so you cannot later change an addend and refit the model. > d <- data.frame(y1=1:5,y2=sin(1:5),x1=log(1:5)) > fit <- lm(y1+y2 ~ x1, data=d, model=TRUE) > fit$model y1 + y2 x1 1 1.841471 0.0000000 2 2.909297 0.6931472 3 3.141120 1.0986123 4 3.243198 1.3862944 5 4.041076 1.6094379 (The same happens if you use a function like abs(x) on the right side of the formula.) Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Apr 23, 2015 at 9:58 AM, Therneau, Terry M., Ph.D. < therneau at mayo.edu> wrote:> This issue has arisen within my anova.coxph routine, but is as easily > illustrated with glm. > > testdata <- data.frame(y= 1:5, > n= c(8,10,6,20,14), > sex = c(0,1,0,1,1), > age = c(30,20,35,25,40)) > > fit <- glm(cbind(y,n) ~ age + sex, binomial, data=testdata, model=TRUE) > saveit <- fit$model > > update(fit, .~. - age, data=saveit) > Error in cbind(y, n) : object 'y' not found > > One would hope that a saved model frame is precisely the thing that would > work best. The issue of course is that "cbind(y, n)" is the name of the > first variable in saveit, and it is not being properly quoted somewhere > down the line. The same issue can occur on the right hand side. "Save the > model frame in case you need to refit something next month" is does not > appear to be a safe approach to reproducable research. > > fit2 <- glm(y ~ sex + log(age), poisson, testdata) > save2 <- fit2$model > update(fit2, . ~ . - sex, data=save2) # fails > glm(y ~ log(age), poisson, save2) # fails > > > I can work around this in my anova, but I wanted to not rebuild the frame > if it is already present. It looks like model.matrix plus attr(x, > 'assign') time -- a bit harder to read, but that looks like what anova.glm > is doing. Is there a way to make update work? > > The current code, BTW, starts by building its own frame using results of > terms.inner, which solves the above issue nicely and update() works as > expected. But it isn't robust to scoping issues. (As pointed out > yesterday by a user: lapply of a function that contained coxph followed by > anova gives a variable not found error.) It also ignores saved model > frames; thus the rewrite. > > Terry T > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Therneau, Terry M., Ph.D. <therneau <at> mayo.edu> writes:> This issue has arisen within my anova.coxph routine, but is as > easily illustrated with glm.> testdata <- data.frame(y= 1:5, > n= c(8,10,6,20,14), > sex = c(0,1,0,1,1), > age = c(30,20,35,25,40)) > > fit <- glm(cbind(y,n) ~ age + sex, binomial, data=testdata, model=TRUE) > saveit <- fit$model > > update(fit, .~. - age, data=saveit) > Error in cbind(y, n) : object 'y' not found> One would hope that a saved model frame is precisely the thing that > would work best. The issue of course is that "cbind(y, n)" is the > name of the first variable in saveit, and it is not being properly > quoted somewhere down the line. The same issue can occur on the > right hand side. "Save the model frame in case you need to refit > something next month" is does not appear to be a safe approach to > reproducable research.> fit2 <- glm(y ~ sex + log(age), poisson, testdata) > save2 <- fit2$model > update(fit2, . ~ . - sex, data=save2) # fails > glm(y ~ log(age), poisson, save2) # fails> I can work around this in my anova, but I wanted to not rebuild the > frame if it is already present. It looks like model.matrix plus > attr(x, 'assign') time -- a bit harder to read, but that looks like > what anova.glm is doing. Is there a way to make update work?> The current code, BTW, starts by building its own frame using > results of terms.inner, which solves the above issue nicely and > update() works as expected. But it isn't robust to scoping issues. > (As pointed out yesterday by a user: lapply of a function that > contained coxph followed by anova gives a variable not found error.) > It also ignores saved model frames; thus the rewrite. Terry TI started to complain about this sort of thing last month, at http://article.gmane.org/gmane.comp.lang.r.devel/37805 I was speaking of updating more generally (which might change anything, not just the formula), but I complained that * we could try to use the model frame (which is stored already), but there are issues with this (the basis of a whole separate rant) because the model frame stores something in between predictor variables and input variables. For example d <- data.frame(y=1:10,x=runif(10)) names(model.frame(lm(y~log(x),data=d))) ## "y" "log(x)" So if we wanted to do something like update to "y ~ sqrt(x)", it wouldn't work ... A mini-version of that rant is that I find the model frame structure quite awkward -- it contains neither what I would call _input_ variables (actual measured things that live in columns in the original data frame) nor _predictor_ variables (columns associated with parameters in the model, i.e. columns of the model matrix). It seems to me that life would be much easier if the model frame contained just the intersection of all.vars(formula) with the column names of the original data set (and with NA values dropped according to na.action()). I appreciate that this is a potentially difficult design problem with ramifications that I don't understand, but ... Martin Maechler tried to explain the rationale for the design to me once, but I didn't manage to understand his argument (so I have since forgotten it). On the other end, it would be nice (in some dream world) to have the capability to associate a model with a *reference* to a model frame; consider the situation where one is fitting 10 or 20 different models to a large data set, ending up with many copies (I'm not 100% sure, but I think that using model.frame() will end up creating an internal copy of the data even if it's not technically modified) of the same gigantic data ... Ben Bolker
On Thu, 23 Apr 2015, Therneau, Terry M., Ph.D. wrote:> This issue has arisen within my anova.coxph routine, but is as easily > illustrated with glm. > > testdata <- data.frame(y= 1:5, > n= c(8,10,6,20,14), > sex = c(0,1,0,1,1), > age = c(30,20,35,25,40)) > > fit <- glm(cbind(y,n) ~ age + sex, binomial, data=testdata, model=TRUE) > saveit <- fit$model > > update(fit, .~. - age, data=saveit) > Error in cbind(y, n) : object 'y' not found > > One would hope that a saved model frame is precisely the thing that > would work best. The issue of course is that "cbind(y, n)" is the name > of the first variable in saveit, and it is not being properly quoted > somewhere down the line. The same issue can occur on the right hand > side. "Save the model frame in case you need to refit something next > month" is does not appear to be a safe approach to reproducable > research.If you want to re-run the same call (which is what the default update method does), then you need to save the original data not the pre-processed model frame. However, the model frame still has all the information you need - but has already evaluated all transformations (cbind, log, etc.).> fit2 <- glm(y ~ sex + log(age), poisson, testdata) > save2 <- fit2$model > update(fit2, . ~ . - sex, data=save2) # fails > glm(y ~ log(age), poisson, save2) # fails > > I can work around this in my anova, but I wanted to not rebuild the > frame if it is already present. It looks like model.matrix plus attr(x, > 'assign') time -- a bit harder to read, but that looks like what > anova.glm is doing. Is there a way to make update work?If you have the model.frame() plus an update formula you could internally extract the response, model matrix and weights, e.g., ## original fit fit <- glm(cbind(y,n) ~ log(age) + sex, family = binomial, data = testdata) ## original model frame mf <- fit$model ## update formula up <- . ~ . - sex ## new terms mt <- update(terms(mf), up) ## response y, matrix x, weights w (NULL here) y <- model.response(mf) x <- model.matrix(mt, mf) w <- model.weights(mf) ## offset could be added similarly And then you can call glm.fit() or coxph.fit() if you can get the family from the original fit. One should still check that the update formula does not change the response (to something which may or may not exist in the model frame). Possibly, one could try to get certain control arguments from the original glm fit (but probably not 'start'). Maybe this helps... (But I can understand that the issues of data frame vs. model frame can be quite a nuisance when programming model utilities :))> The current code, BTW, starts by building its own frame using results of > terms.inner, which solves the above issue nicely and update() works as > expected. But it isn't robust to scoping issues. (As pointed out > yesterday by a user: lapply of a function that contained coxph followed > by anova gives a variable not found error.) It also ignores saved model > frames; thus the rewrite. > > Terry T > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Maybe Matching Threads
- issues with calling predict.coxph.penal (survival) inside a function
- Time-dependent coefficients in a Cox model with categorical variants
- basehaz() in package survival and warnings with coxph
- basehaz() in package 'Survival' and warnings() with coxph
- coxme frailty model standard errors?