I'm attempting to fit boosted regression trees to a censored response using
IPCW weighting. I've implemented this through two libraries, mboost and
gbm, which I believe should yield models that would perform comparably.
This, however, is not the case - mboost performs much better. This seems
odd. This issue is meaningful since the output of this regression needs to
be implemented in a production system, and mboost doesn't even expose the
ensembles.
# default params for blackboost are a gaussian loss, and maxdepth of 2
m.mboost = blackboost(Y ~ X1 + X2, data=tdata, weights=t.ipcw,
control=boost_control(mstop=100))
m.gbm = gbm(Y ~ X1 + X2, data=tdata, weights=t.ipcw,
distribution="gaussian", interaction.depth=2, bag.fraction=1,
n.trees=2500)
# compare IPCW weighted squared loss
sum((predict(m.mboost, newdata=tdata)-tdata$Y)^2 * t.ipcw) <
sum((predict(m.gbm, newdata=tdata, n.trees=2500)-tdata$Y)^2 * t.ipcw)
# TRUE, mboost with 100 trees will reduce the loss function from gbm 100
trees by 20%, and gbm with 2500 trees by 5%
The documentation says blackboost essentially does the same thing as mboost,
so any ideas on what could be driving this large difference in performance?
--
View this message in context:
http://r.789695.n4.nabble.com/mboost-vs-gbm-tp4637518.html
Sent from the R help mailing list archive at Nabble.com.