According to Dr. Breiman, the RF should be more accurate method than a single tree. However, the performance of each method seems to depend on the proprotion of outcome variable in my case. My data set is a typical classification problem (predict bad guys). When I ran both of them with different proportion of outcome variables(there's a criterion to measure the degree of bad behavior), I got very strange results. 1. proportion of 1 to 0 = 1:4 err.rate of CART = 25.2% err.rate of RF = 25.6% 2. 1:9 err.rate of CART = 28.5% err.rate of RF = 21.2% 3. 1:33 err.rate of CART = 28.2% err.rate of RF = 12.1% 4. 1:99 err.rate of CART = 25.1% err.rate of RF = 7.3% In 3 & 4, RF looks superior to CART. But I'm afraid RF just vote for "0" to reduce the error rate. Any suggestions? Thank you. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
At 12:51 PM 9/25/2002, Andrew Baek wrote: >According to Dr. Breiman, the RF should be more accurate >method than a single tree. However, the performance of each >method seems to depend on the proprotion of outcome variable >in my case. My data set is a typical classification problem >(predict bad guys). When I ran both of them with different >proportion of outcome variables(there's a criterion to measure >the degree of bad behavior), I got very strange results. > >1. proportion of 1 to 0 = 1:4 >err.rate of CART = 25.2% >err.rate of RF = 25.6% > >2. 1:9 >err.rate of CART = 28.5% >err.rate of RF = 21.2% > >3. 1:33 >err.rate of CART = 28.2% >err.rate of RF = 12.1% > >4. 1:99 >err.rate of CART = 25.1% >err.rate of RF = 7.3% > > >In 3 & 4, RF looks superior to CART. But I'm afraid RF just >vote for "0" to reduce the error rate. Any suggestions? Where are you getting CART results in R? CART is a trademark of Salford Systems and is not implemented AFAIK in R (or SPlus). -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
If either method were just guessing 0 to reduce the error rate, shouldn't they achieve a 1/34 ~ 3% or 1/100 = 1% error rate in the last two examples? And for that matter 20% and 10% in the first two? It doesn't look like that's what's going on. One suggestion if making sure you find the 1's is more important than having a low overall error rate: in rpart, you can specify a loss matrix to say that certain kinds of errors are more important than others. In a random forest, you can use different voting thresholds for "1-ness" and "0-ness" to bias things -- that is, instead of just taking majority vote, you might require (for example) 85% of the trees to agree for something to be declared in class 0. It's hard to say much more without knowing anything about your data. But in my experience random forests have substantially outperformed single trees in many problems (and I haven't yet encountered one in which a single tree outperformed a random forest). Hope this helps, Matthew Wiener RY84-202 Applied Computer Science & Mathematics Dept. Merck Research Labs 126 E. Lincoln Ave. Rahway, NJ 07065 732-594-5303 -----Original Message----- From: Andrew Baek [mailto:andrew at stat.ucla.edu] Sent: Wednesday, September 25, 2002 3:52 PM To: r-help at stat.math.ethz.ch Subject: [R] CART vs. Random Forest According to Dr. Breiman, the RF should be more accurate method than a single tree. However, the performance of each method seems to depend on the proprotion of outcome variable in my case. My data set is a typical classification problem (predict bad guys). When I ran both of them with different proportion of outcome variables(there's a criterion to measure the degree of bad behavior), I got very strange results. 1. proportion of 1 to 0 = 1:4 err.rate of CART = 25.2% err.rate of RF = 25.6% 2. 1:9 err.rate of CART = 28.5% err.rate of RF = 21.2% 3. 1:33 err.rate of CART = 28.2% err.rate of RF = 12.1% 4. 1:99 err.rate of CART = 25.1% err.rate of RF = 7.3% In 3 & 4, RF looks superior to CART. But I'm afraid RF just vote for "0" to reduce the error rate. Any suggestions? Thank you. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-. -.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._. _._ ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
We haven't implemented different voting thresholds in the package itself, but when you predict you can get out votes or probabilities rather than classes if you want. The argument type to predict.randomForest is "class" by default, but can also be "vote" or "prob". You can use the training set to figure out what a good threshold is, and then check your results on a test set. Then you just use the threshold later. I suppose we could implement a threshold that could be supplied to predict, but then we'd have to work something out for multi-class problems -- several different cutpoints, I guess. It's not a priority for Andy or me right now. I actually like to take a look at the ROC curve anyway, to decide what tradeoffs are worthwhile. I'd compare the results by looking at the error rates -- if you can make the (possibly weighted) error rate lower with one method or the other, that's the method that ones. Regards, Matt -----Original Message----- From: Andrew Baek [mailto:andrew at stat.ucla.edu] Sent: Thursday, September 26, 2002 3:33 PM To: Wiener, Matthew Cc: r-help at stat.math.ethz.ch Subject: RE: [R] CART vs. Random Forest> One suggestion if making sure you find the 1's is more important thanhaving> a low overall error rate: in rpart, you can specify a loss matrix to say > that certain kinds of errors are more important than others. In a random > forest, you can use different voting thresholds for "1-ness" and "0-ness"to> bias things -- that is, instead of just taking majority vote, you might > require (for example) 85% of the trees to agree for something to bedeclared> in class 0.If I use loss matrix in "rpart" and different threshold in "RF", how can I compare two packages? Well, Andy Liaw told me "classwt" in RF does not help much. But when I modified priors in rpart, I got totall new results. So I thought this should be applied to RF. Also, I'll appreciate if you tell me how to change the voting threshold in RF. I couldn't find it in the manual. Thank you. Andrew ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I wouldn't bother modifying classwt -- that doesn't seem to have much effect (as Breiman has mentioned to Andy Liaw). I use the following function, "biased predict". (Not specific to random forests, which is why it's not in the package.) biased.predict <- function(object, newdata, thresh, which.test = "Bad", if.high = "Bad", if.low = "Good", pred.type = "prob"){ probs <- predict(object, newdata = newdata, type = pred.type) levels <- dimnames(probs)[[2]] ans <- apply(probs, 1, function(x){ ifelse(x[which.test] > thresh, if.high, if.low)}) ans <- factor(ans, levels = levels) } You can get the errors of different types -- the confusion matrix -- from table(data.frame(true = true.vals, pred = pred.vals)), and then multiply this by a weight matrix to get a weighted error score. You can run biased.predict for a number of different threshold values and check the weighted error scores, choosing the threshold that gives you the lowest. (Though running this over and over is inefficient -- better predict the probabilities once and then do multiple cutoffs.) Or you can choose your threshold by saying that one type of error must be no larger than a certain value (which is what I've usually done, precisely to limit false negatives, as you want to). Once you've chosen a threshold, you can used biased.predict for new data. I hope I'm making sense, and that this helps. Matt -----Original Message----- From: Andrew Baek [mailto:andrew at stat.ucla.edu] Sent: Thursday, September 26, 2002 4:36 PM To: Wiener, Matthew Cc: r-help at stat.math.ethz.ch Subject: RE: [R] CART vs. Random Forest Of course, the CART & RF are different method. But at least, I have to consider that false negative is more serious than false positive in my problem. For this purpose, I used "prior" in rpart and "classwt" in RF. Then, should I modify priors and cut-off point at the same time? Andrew On Thu, 26 Sep 2002, Wiener, Matthew wrote:> We haven't implemented different voting thresholds in the package itself, > but when you predict you can get out votes or probabilities rather than > classes if you want. The argument type to predict.randomForest is "class" > by default, but can also be "vote" or "prob". You can use the trainingset> to figure out what a good threshold is, and then check your results on a > test set. Then you just use the threshold later. > > I suppose we could implement a threshold that could be supplied topredict,> but then we'd have to work something out for multi-class problems --several> different cutpoints, I guess. It's not a priority for Andy or me rightnow.> I actually like to take a look at the ROC curve anyway, to decide what > tradeoffs are worthwhile. > > I'd compare the results by looking at the error rates -- if you can makethe> (possibly weighted) error rate lower with one method or the other, that's > the method that ones. > > Regards, > > Matt------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> I suppose we could implement a threshold that could be > supplied to predict, > but then we'd have to work something out for multi-class > problems -- several > different cutpoints, I guess. It's not a priority for Andy > or me right now.Leo has been working on this for his Version 4. Thus I see no reason for me to spend time on it now 8-). From the alpha code that he sent me, he has three different ways of thresholding. They all work after the trees are grown, so the OOB estimates do not reflect the threshold. I don't see a good way to deal with that so far. Andy ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named in this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. ============================================================================= -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._