Benjamin Hofner
2010-Mar-19 10:47 UTC
[R] mboost: Interpreting coefficients from glmboost if center=TRUE
Sorry for the tardy reply but I just found your posting incidentally today. To make long things short: You are right about the centering. We forgot to correct the intercept if center = TRUE. We lately found the problem ourself and fixed it in the current version (mboost 2.0-3). However the problem only occurred if you extracted the coefficients. As the intercept is rarely interpretable we didn't have a closer look and thus it took some time until we found this bug. The model predictions however always took care of the centering and thus were not affected (as you already pointed out). As you realized centering is of hight importance if you use glmboost as it reduces the number of boosting iterations needed to estimate the model and furthermore often improves the estimates. Centering is also important if you use gamboost() and specify linear base-learners without intercept (e.g. bols(x, intercept=FALSE)). However, in this case you have to center the covariates yourself and take care of the intercept correction afterwards. You wrote you weren't aware of a mailing list for mboost but you could write to the maintainer (see http://cran.at.r-project.org/package=mboost) for possible help and/or to report bugs. HTH Benjamin Kyle Werner wrote:> Thanks for your reply. In fact, I do use the predict method for model > assessment, and it shows that centering leads to a substantial > improvement using even the bluntest of assessments of 'goodness' > (i.e., binary categorization accuracy). So I agree that the package > authors must have internal tools to reverse the effects of centering > the variables, at least within the predict method. But it seems to me > that the coefficients that I get out should be related to the values > that I input, not to the centered values. In other words, centering > seems like it should be done "invisibly;" unless I center the > variables myself, I would expect the coefficients to be applicable to > the original data. > > I extract the coefficients returned by the model and store them in a > database which is web accessible. I reconstruct models periodically, > and track various statistics associated with these models in the > database. This is why I highly value the fact that mboost has > glmboost, which can return linearly interpretable coefficients. It is > also why I do not directly call upon R every time I want to query a > model. (As an aside, if I were to use R directly, I might consider the > gamboost or blackboost methods, which do not return scalar > coefficients that are readily extractable.) > > > > On Sun, Feb 7, 2010 at 6:31 PM, David Winsemius <dwinsemius at > comcast.net> wrote: > > > > On Feb 7, 2010, at 5:03 PM, Kyle Werner wrote: > > > >> I'm running R 2.10.1 with mboost 2.0 in order to build predictive > >> models . I am performing prediction on a binomial outcome, using a > >> linear function (glmboost). However, I am running into some confusion > >> regarding centering. (I am not aware of an mboost-specific mailing > >> list, so if the main R list is not the right place for this topic, > >> please let me know.) > >> > >> The boost_control() function allows for the choice between center=TRUE > >> and center=FALSE. If I select center=FALSE, I am able to interpret the > >> coefficients just like those from standard logistic regression. > >> However, if I select center=TRUE, this is no longer the case. In > >> theory and in practice with my data, centering improves the > >> predictions made by the model, so this is an issue worth pursuing for > >> me. > >> > >> Below is output from running the exact same data in exactly the same > >> way, only differing by whether the "center" bit is flipped or not: > >> > >> Output with center=TRUE: > >> [(Intercept)] => -0.04543632 > >> [painscore] => 0.007553608 > >> [Offset] => -0.546520621809327 > >> > >> Output with center=FALSE: > >> [(Intercept)] => -0.989742 > >> [painscore] => 0.001342585 > >> [Offset] => -0.546520621809327 > >> > >> The mean of painscore is 741. It seems to me that for center=FALSE, > >> mboost should modify the intercept by subtracting 741*0.007553608 from > >> it (thus intercept should = -11.285). If I manually do this, the > >> output is credible, and in the ballpark of that given by other methods > >> (e.g., lrm or glm with a Binomial link function). If I don't do this, > >> then the inverse logistic interpretation of the output is off by > >> orders of magnitude. > >> > >> In the end, with "center=TRUE", and I want to make a prediction based > >> on the coefficients returned by mboost, the results only make sense if > >> I manually rescale my independent variables prior to making a > >> prediction. Is this the desired behavior, or am I doing something > >> wrong? > > > > I don't know, but my question is ... why aren't you using the predict > method > > for that sort of object? Presumably the authors of the package know > how to > > recognize the differences in the objects. Testing confirms this to be > the > > case with the first example in the glmboost help page. > > > > > >> > >> Many thanks. > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > David Winsemius, MD > > Heritage Laboratories > > West Hartford, CT > >-- ****************************************************************************** Dipl.-Stat. Benjamin Hofner Institut f?r Medizininformatik, Biometrie und Epidemiologie Friedrich-Alexander-Universit?t Erlangen-N?rnberg benjamin.hofner at imbe.med.uni-erlangen.de http://www.imbe.med.uni-erlangen.de/~hofnerb/ http://www.benjaminhofner.de