Byron Dom
2014-May-16 23:18 UTC
[R] Using unbalanced-learning algorithms in the randomForest
Responding to my own post/question here. Andy Liaw directed me to this page: http://grokbase.com/t/r/r-help/05av0aaa2e/r-repost-examples-of-classwt-strata-and-sampsize-i-n-randomforest, which gives an answer to my question. ----------------------------------- original post --------------------------------------------------- Date: Tue, 6 May 2014 22:54:22 -0700 (PDT) From: Byron Dom <byron_dom at yahoo.com> To: "r-help at r-project.org" <r-help at r-project.org> Subject: [R] Using unbalanced-learning algorithms in the randomForest package. Message-ID: <1399442062.12706.YahooMailNeo at web142801.mail.bf1.yahoo.com> Content-Type: text/plain In archive: https://stat.ethz.ch/pipermail/r-help/2014-May/374384.html The following report by the authors of the randomForest package describes two different algorithm modifications for using random forests to learn classifiers for "unbalanced" learning problems in which one class is much less frequent than the other (in 2-class problems). These two variations are called "balanced RF" and "weighted RF." http://statistics.berkeley.edu/sites/default/files/tech-reports/666.pdf Would someone please answer these three questions. (1) Is it possible to use the R randomForest package to learn random forests using either of these modified RF-learning algorithms? (2) If it is possible, how does one do it? (3) Is there some detailed documentation for running these modified versions? I've read the R package manual but it's too sketchy. It seems to be primarily for users who are already familiar with the package and just need to look up some detail like the name of an argument.