Hello,
I tried two different implementations of the stochastic gradient
boosting (Friedman 2002) : the MART(tm) with R tool
(http://www-stat.stanford.edu/~jhf/R-MART.html) and the gbm R package.
To me, both seemed fairly comparable, except maybe regarding the
different loss criterion proposed and the fact that the gbm tool is
sightly more convenient to use. However, it seemed that the MART with R
tool systematically outperforms the gbm tool in terms of goodness of fit
(whatever the way of choosing the best iteration for the gbm package).
I tried to find out if there were specific options that could have
explained it but nothing came out. See below for an example of how I
compare both implementations. Did any one had the same experience, and
can anyone give me hints about such performance differences or tell me
if I am missing something obvious?
Thank you in advance, Manuel
Here are the arguments and options I used for comparison purposes,
working on a 1600 records * 15 variables dataset :
# the MART with R tool
lx <-
mart( as.matrix(x), y, c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2)
niter=1000, tree.size=6, learn.rate=0.01,
loss.cri=2 # gaussian
)
# for gbm
gbm1 <- gbm(y ~ v1 + v2 + v3 + v4 + v5+ v6+ v7+ v8 + v9 + v10 + v11 +
v12 + v13 + v14 + v15,
data=data, var.monotone=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0),
distribution="gaussian", n.trees=1000, shrinkage=0.01,
interaction.depth=6, bag.fraction = 0.5, train.fraction = 0.5,
n.minobsinnode = 10, cv.folds = 1, keep.data=TRUE)
# I then do predictions on the same dataset, and further perform
goodness of fit comparisons
#...
Hello,
I have been using two different implementations of the stochastic
gradient boosting (Friedman 2002) : MART(tm) with R and the gbm package.
Both are fairly comparable except that the MART with R systematically
strongly (depending on the dataset though) outperforms the gbm tool in
terms of goodness of fit.
For instance, a
# gbm package
gbm1 <- gbm(Y~X2+X3+X4+X5+X6,
data=data,
var.monotone=c(0,0,0,0,0), # 0: no monotone restrictions
distribution="gaussian", # bernoulli, adaboost,
gaussian,
# poisson, and coxph available
n.trees=3000, # number of trees
shrinkage=0.005, # shrinkage or learning rate,
# 0.001 to 0.1 usually work
interaction.depth=6, # 1: additive model, 2: two-way
interactions, etc.
bag.fraction = 0.5, # subsampling fraction, 0.5 is
probably best
train.fraction = 0.5, # fraction of data for training,
# first train.fraction*N used for
training
n.minobsinnode = 10, # minimum total weight needed in
each node
cv.folds = 5, # do 5-fold cross-validation
keep.data=TRUE, # keep a copy of the dataset with
the object
verbose=TRUE) # print out progress
# MART with R
X <- as.matrix(cbind(data$X2,as.numeric(data$X3),
as.numeric(data$X4),as.numeric(data$X5),data$X6))
Y <- data$Y
mart(X, Y, c(1,2,2,2,1) , niter=3000, tree.size=6, learn.rate=0.005,
loss.cri=2 #gaussian too
)
leads to very different goodnesses of fit (I can provide the dataset if
needed).
Did anyone already encountered this, is there an explanation, am I
missing something obvious in the argument settings?
Thank you in advance,
Manuel