Achim Zeileis
2015-Jun-26 14:41 UTC
[Rd] [R-pkg-devel] Guidelines for S3 regression models
Stephen, thanks for your effort. The more appropriate list for this discussion is probably R-devel (as far as I understand it) so I've moved the discussion there. Related topics have already been discussed in the past. Specifically, I remember contributions by Paul Johnson ("rockchalk" package) and John Fox ("effects" and "car" package) as their packages also provide generic infrastructure for visualizing models and carrying out inference. I have also some related packages such as "lmtest", "sandwich", "strucchange", or "multcomp". Exporting tables of regression coefficients in a modular way via "texreg" or "memisc" could also be added.> Once we have built a regression model, we typically want to use the > model for further processing, such as making predictions from the model > or plotting the residuals. Unfortunately, for many packages on CRAN > this can be difficult. > > For example, some models don't have a residuals method and don't save > the call or data --- so you can't tell how to generate the residuals > from the model object itself. > > A common snag is that for some models the new data for predict() has to > be a matrix; for others it has to be a data.frame. This places an > unnecessary burden on the user when both data.frames and matrices can > easily be supported by predict. > > To mitigate such issues, I'm going out on a limb and presenting some > guidelines for writers of S3 regression model functions (this document > is currently part of the plotmo package):I think this is a nice and useful starting point. It's probably not comprehensive (yet) but will surely help. You could add something more about writing the formula interface and the correct processing of model.frame, terms, model.response, model.matrix, model.weights, model.offset. Especially for models with linear predictors the latter two can be very useful and are often not hard to implement. In case the model has multiple parts or multiple responses, the "Formula" package (and its vignette) might also be helpful. As for the S3 methods, I would omit coefficients, fitted.values, and resid from the list. These dispatch to coef, fitted, and residuals anyway. For inference it would also be very useful to add nobs(), df.residual(), vcov(), and logLik() and/or deviance() where applicable. An overview which lists some (but not all) useful methods is in Table 1 of vignette("betareg", package = "betareg"). For coef() and vcov() it is useful/important that the names and dimension match. Then Wald tests can be easily computed in functions like car::linearHypothesis(), car::deltaMethod(), lmtest::waldtest(), or lmtest::coeftest(). Thanks & best wishes, Achim> http://www.milbo.org/doc/modguide.pdf > > Your comments would be appreciated. > > Stephen Milborrow > > ______________________________________________ > R-package-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-package-devel > >
Given how much documentation is available on R coding in general, it is surprising how little is available specifically on writing model code. Researchers who come up with a new method of regression, and who want to write an S3 model for that method, must currently go all the way back to the Venables and Ripley S programming book.> On 26.06.2015 14:09, Stephen Milborrow wrote: > > Once we have built a regression model, we typically want to use the > > model for further processing, such as making predictions from the model > > or plotting the residuals. Unfortunately, for many packages on CRAN > > this can be difficult. > > > > For example, some models don't have a residuals method and don't save > > the call or data --- so you can't tell how to generate the residuals > > from the model object itself. > > > > A common snag is that for some models the new data for predict() has to > > be a matrix; for others it has to be a data.frame. This places an > > unnecessary burden on the user when both data.frames and matrices can > > easily be supported by predict. > > > > To mitigate such issues, I'm going out on a limb and presenting some > > guidelines for writers of S3 regression model functions (this document > > is currently part of the plotmo package): > > http://www.milbo.org/doc/modguide.pdf > > On 26.06.2015 16:41, Achim Zeileis wrote: > I think this is a nice and useful starting point. It's probably not > comprehensive (yet) but will surely help. > > You could add something more about writing the formula interface and the > correct processing of model.frame, terms, model.response, model.matrix, > model.weights, model.offset. Especially for models with linear predictors > the latter two can be very useful and are often not hard to implement. In > case the model has multiple parts or multiple responses, the "Formula" > package (and its vignette) might also be helpful. > > As for the S3 methods, I would omit coefficients, fitted.values, and resid > from the list. These dispatch to coef, fitted, and residuals anyway. For > inference it would also be very useful to add nobs(), df.residual(), > vcov(), and logLik() and/or deviance() where applicable. An overview which > lists some (but not all) useful methods is in Table 1 of > vignette("betareg", package = "betareg"). > > For coef() and vcov() it is useful/important that the names and dimension > match. Then Wald tests can be easily computed in functions like > car::linearHypothesis(), car::deltaMethod(), lmtest::waldtest(), or > lmtest::coeftest().Thanks for these, I'll update the document. Stephen Milborrow