Cubist is a rule-based machine learning model for regression. Parts of the Cubist model are described in: Quinlan. Learning with continuous classes. Proceedings of the 5th Australian Joint Conference On Artificial Intelligence (1992) pp. 343-348 Quinlan. Combining instance-based and model-based learning. Proceedings of the Tenth International Conference on Machine Learning (1993) pp. 236-243 RuleQuest, the company that created the program, now have a version available under the GPL at: http://rulequest.com/cubist-info.html We've taken the Cubist GPL code and created an R interface. The package locations are: http://cran.r-project.org/web/packages/mvpart/index.html and https://r-forge.r-project.org/projects/rulebasedmodels/ The primary functions are cubist() for creating the ruled and the terminal models and predict.cubist() to predict new outcomes. The model allows for instance-based corrections of the model predictions. We've separated the instance-based correction from the model build so that the choice of instances is only needed when samples are predicted. An interface for tuning the Cubist model will be available in the caret package shortly. We are also working on a similar port of C5.0 (also GPL'ed). The C code is very similar, so much of the Cubist changes can be extended. That said, we'd appreciate help if anyone wants to contribute. Here is an example cubist session: library(mlbench) data(BostonHousing) ## 1 committee and no instance-based correction, so just an M5 fit: mod1 <- cubist(x = BostonHousing[, -14], y = BostonHousing$medv) summary(mod1) ## example output: ## Cubist [Release 2.07 GPL Edition] Sun Apr 10 17:36:56 2011 ## --------------------------------- ## ## Target attribute `outcome' ## ## Read 506 cases (14 attributes) from undefined.data ## ## Model: ## ## Rule 1: [101 cases, mean 13.84, range 5 to 27.5, est err 1.98] ## ## if ## nox > 0.668 ## then ## outcome = -1.11 + 2.93 dis + 21.4 nox - 0.33 lstat + 0.008 b ## - 0.13 ptratio - 0.02 crim - 0.003 age + 0.1 rm ## ## Rule 2: [203 cases, mean 19.42, range 7 to 31, est err 2.10] ## ## if ## nox <= 0.668 ## lstat > 9.59 ## then ## outcome = 23.57 + 3.1 rm - 0.81 dis - 0.71 ptratio - 0.048 age ## - 0.15 lstat + 0.01 b - 0.0041 tax - 5.2 nox + 0.05 crim ## + 0.02 rad ## ## Rule 3: [43 cases, mean 24.00, range 11.9 to 50, est err 2.56] ## ## if ## rm <= 6.226 ## lstat <= 9.59 ## then ## outcome = 1.18 + 3.83 crim + 4.3 rm - 0.06 age - 0.11 lstat - 0.003 tax ## - 0.09 dis - 0.08 ptratio ## ## Rule 4: [163 cases, mean 31.46, range 16.5 to 50, est err 2.78] ## ## if ## rm > 6.226 ## lstat <= 9.59 ## then ## outcome = -4.71 + 2.22 crim + 9.2 rm - 0.83 lstat - 0.0182 tax ## - 0.72 ptratio - 0.71 dis - 0.04 age + 0.03 rad - 1.7 nox ## + 0.008 zn ## ## ## Evaluation on training data (506 cases): ## ## Average |error| 2.07 ## Relative |error| 0.31 ## Correlation coefficient 0.94 ## ## ## Attribute usage: ## Conds Model ## ## 80% 100% lstat ## 60% 92% nox ## 40% 100% rm ## 100% crim ## 100% age ## 100% dis ## 100% ptratio ## 80% tax ## 72% rad ## 60% b ## 32% zn ## ## ## Time: 0.0 secs Thanks, Max, Steve and Chris _______________________________________________ R-packages mailing list R-packages at r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages