I extracting splitting rules from Greg Ridgeway's GBM 1.6-3.2 in R 2.15.2, so I can run classification in a production system outside of R. ?I have it working and verified for a dummy data set with all variable types (numeric, factor, ordered) and missing values, but in the titanic survivors data set the splitting rule for factors does not make sense. ?The attached code and log below explains the dilemma. Also I tried tracing the predictions backwards (also in the log and code), but it doesn't make sense. The first record is a female and predict.gbm() gives values (around 0.0108) that land her on rule 7 on both trees. ?Then 7's parent is 5, and 5's parent is 0, but in each tree rule 0 has a different SplitCodePred. Andrew
The mailing list ate the attachments, so here they are again. R code gist.github.com/4270628 Log pastebin.com/0e49CTsL Andrew -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Andrew Ziem Sent: Wednesday, December 12, 2012 10:33 AM To: r-help at r-project.org Subject: [R] extracting splitting rules from GBM I am extracting splitting rules from Greg Ridgeway's GBM 1.6-3.2 in R 2.15.2, so I can run classification in a production system outside of R. ?I have it working and verified for a dummy data set with all variable types (numeric, factor, ordered) and missing values, but in the titanic survivors data set the splitting rule for factors does not make sense. ?The attached code and log below explains the dilemma. Also I tried tracing the predictions backwards (also in the log and code), but it doesn't make sense. The first record is a female and predict.gbm() gives values (around 0.0108) that land her on rule 7 on both trees. ?Then 7's parent is 5, and 5's parent is 0, but in each tree rule 0 has a different SplitCodePred. Andrew