Hi all, There is now a package available on CRAN that provides an R interface to Leo Breiman's random forest classifier. Basically, random forest does the following: 1. Select ntree, the number of trees to grow, and mtry, a number no larger than number of variables. 2. For i = 1 to ntree: 3. Draw a bootstrap sample from the data. Call those not in the bootstrap sample the "out-of-bag" data. 4. Grow a "random" tree, where at each node, the best split is chosen among mtry randomly selected variables. The tree is grown to maximum size and not pruned back. 5. Use the tree to predict out-of-bag data. 6. In the end, use the predictions on out-of-bag data to form majority votes. 7. Prediction of test data is done by majority votes from predictions from the ensemble of trees. In the tech report http://oz.berkeley.edu/users/breiman/randomforest2001.pdf, Breiman showed that this technique is very competitive to boosting classification trees. In our own experience, it is competitive with nonlinear classifiers such as artificial neural nets and support vector machines. Two of the significant advantages of random forests over other methods (IMHO) are: a) there is only one parameter (mtry) to adjust, and the result usually not sensititve to it; and b) the built-in cross-validation via the use of out-of-bag data gives quite accurate estimate of test set error, and offers quite effective protection against overfitting. The code is based on version 3.1 of the original Fortran code written by Breiman and Cutler (http://www.stat.berkeley.edu/users/breiman/). The User Guide for the Fortran code on Breiman's web site explains some of the facilities provided in the code (such as assessing variable importance, and proximity measures). Some facilities provided in the original Fortran code have be taken out: transforming data to principal components, and multidimensional scaling of the "proximity" matrix. These can easily be done in R before and after calls to the random forest functions. Random numbers are generated by R's RNG, rather than the one supplied in the original Fortran code. I'd like to thank Profs. B. D. Ripley, J. Lindsey, and others on R-help that answered many of my questions when I was working on this package. The formula interface and part of the code in the predict method are out-right "stolen" from svm() in the e1071 package and nnet() in the VR bundle. Questions/comments/bugs/patches welcomed! Regards, Andy Andy I. Liaw, PhD Biometrics Research Phone: (732) 594-0820 Merck & Co., Inc. Fax: (732) 594-1565 P.O. Box 2000, RY70-38 Rahway, NJ 07065 mailto:andy_liaw at merck.com ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-announce mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-announce-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi Andy, I'm glad to see that someone has put up an R package of Leo's code. I made an R package using his first release of the code, but never had/took the time to push it through the publication review process here so that I could distribute it. I'm glad you have. -Greg> -----Original Message----- > From: Liaw, Andy [mailto:andy_liaw at merck.com] > Sent: Tuesday, April 02, 2002 10:23 AM > To: 'r-announce at lists.R-project.org' > Subject: random forests for R > > > > Hi all, > > There is now a package available on CRAN that provides an R > interface to Leo > Breiman's random forest classifier. > > Basically, random forest does the following: > > 1. Select ntree, the number of trees to grow, and mtry, a > number no larger > than number of variables. > 2. For i = 1 to ntree: > 3. Draw a bootstrap sample from the data. Call those not in > the bootstrap > sample the "out-of-bag" data. > 4. Grow a "random" tree, where at each node, the best split > is chosen among > mtry randomly selected variables. The tree is grown to > maximum size and not > pruned back. > 5. Use the tree to predict out-of-bag data. > 6. In the end, use the predictions on out-of-bag data to > form majority > votes. > 7. Prediction of test data is done by majority votes from > predictions from > the ensemble of trees. > > In the tech report > http://oz.berkeley.edu/users/breiman/randomforest2001.pdf, > Breiman showed > that this technique is very competitive to boosting > classification trees. > In our own experience, it is competitive with nonlinear > classifiers such as > artificial neural nets and support vector machines. Two of > the significant > advantages of random forests over other methods (IMHO) are: > a) there is only > one parameter (mtry) to adjust, and the result usually not > sensititve to it; > and b) the built-in cross-validation via the use of > out-of-bag data gives > quite accurate estimate of test set error, and offers quite effective > protection against overfitting. > > The code is based on version 3.1 of the original Fortran code > written by > Breiman and Cutler(http://www.stat.berkeley.edu/users/breiman/). The User Guide for the Fortran code on Breiman's web site explains some of the facilities provided in the code (such as assessing variable importance, and proximity measures). Some facilities provided in the original Fortran code have be taken out: transforming data to principal components, and multidimensional scaling of the "proximity" matrix. These can easily be done in R before and after calls to the random forest functions. Random numbers are generated by R's RNG, rather than the one supplied in the original Fortran code. I'd like to thank Profs. B. D. Ripley, J. Lindsey, and others on R-help that answered many of my questions when I was working on this package. The formula interface and part of the code in the predict method are out-right "stolen" from svm() in the e1071 package and nnet() in the VR bundle. Questions/comments/bugs/patches welcomed! Regards, Andy Andy I. Liaw, PhD Biometrics Research Phone: (732) 594-0820 Merck & Co., Inc. Fax: (732) 594-1565 P.O. Box 2000, RY70-38 Rahway, NJ 07065 mailto:andy_liaw at merck.com ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-announce mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-announce-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Super. Der Algorithmus gef?llt mir sehr gut und scheint auch gar nicht so schwierig zu realisieren zu sein. Er erinnert etwas an die genetischen Algorithmen. Gru? Bernd Huwe -----Urspr?ngliche Nachricht----- Von: owner-r-announce at stat.math.ethz.ch [mailto:owner-r-announce at stat.math.ethz.ch]Im Auftrag von Liaw, Andy Gesendet: Dienstag, 2. April 2002 17:23 An: 'r-announce at lists.R-project.org' Betreff: random forests for R Hi all, There is now a package available on CRAN that provides an R interface to Leo Breiman's random forest classifier. Basically, random forest does the following: 1. Select ntree, the number of trees to grow, and mtry, a number no larger than number of variables. 2. For i = 1 to ntree: 3. Draw a bootstrap sample from the data. Call those not in the bootstrap sample the "out-of-bag" data. 4. Grow a "random" tree, where at each node, the best split is chosen among mtry randomly selected variables. The tree is grown to maximum size and not pruned back. 5. Use the tree to predict out-of-bag data. 6. In the end, use the predictions on out-of-bag data to form majority votes. 7. Prediction of test data is done by majority votes from predictions from the ensemble of trees. In the tech report http://oz.berkeley.edu/users/breiman/randomforest2001.pdf, Breiman showed that this technique is very competitive to boosting classification trees. In our own experience, it is competitive with nonlinear classifiers such as artificial neural nets and support vector machines. Two of the significant advantages of random forests over other methods (IMHO) are: a) there is only one parameter (mtry) to adjust, and the result usually not sensititve to it; and b) the built-in cross-validation via the use of out-of-bag data gives quite accurate estimate of test set error, and offers quite effective protection against overfitting. The code is based on version 3.1 of the original Fortran code written by Breiman and Cutler (http://www.stat.berkeley.edu/users/breiman/). The User Guide for the Fortran code on Breiman's web site explains some of the facilities provided in the code (such as assessing variable importance, and proximity measures). Some facilities provided in the original Fortran code have be taken out: transforming data to principal components, and multidimensional scaling of the "proximity" matrix. These can easily be done in R before and after calls to the random forest functions. Random numbers are generated by R's RNG, rather than the one supplied in the original Fortran code. I'd like to thank Profs. B. D. Ripley, J. Lindsey, and others on R-help that answered many of my questions when I was working on this package. The formula interface and part of the code in the predict method are out-right "stolen" from svm() in the e1071 package and nnet() in the VR bundle. Questions/comments/bugs/patches welcomed! Regards, Andy Andy I. Liaw, PhD Biometrics Research Phone: (732) 594-0820 Merck & Co., Inc. Fax: (732) 594-1565 P.O. Box 2000, RY70-38 Rahway, NJ 07065 mailto:andy_liaw at merck.com ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-announce mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-announce-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._