On page 409 of "Applied Predictive Modeling" by Max Kuhn, it states
that the gbm function can accomodate only two class problems when
referring to the distribution parameter.
>From gbm help re: the distribution parameter:
Currently available options are "gaussian" (squared error),
"laplace" (absolute loss), "tdist" (t-distribution
loss),
"bernoulli" (logistic regression for 0-1 outcomes),
"huberized" (huberized hinge loss for 0-1 outcomes),
"multinomial" (classification when there are more than 2
classes), "adaboost" (the AdaBoost exponential loss for 0-1
outcomes), "poisson" (count outcomes), "coxph"
(right
censored observations), "quantile", or "pairwise"
(ranking
measure using the LambdaMart algorithm).
I would have thought that huberized and multinomial would also be
possible. Is that not so? In any case, how would anything different
from bernoulli (the default) be specified when using the caret train
function since distribution appears not to be among the list of
parameters that caret recognises?
> getModelInfo("gbm")[["gbm"]]$parameters
parameter class label
1 n.trees numeric # Boosting Iterations
2 interaction.depth numeric Max Tree Depth
3 shrinkage numeric Shrinkage
4 n.minobsinnode numeric Min. Terminal Node Size
Is that a limitation of the caret package? Or is there something I'm
not getting?
--
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Average minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Eleanor Roosevelt
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.