Greetings. I'm trying to determine whether to use rpart or randomForest for a classification tree. Has anybody tested efficacy formally? I've run both and the confusion matrix for rf beats rpart. I've looking at the rf help page and am unable to figure out how to extract the tree. But more than that I'm looking for a more comprehensive user's guide for randomForest including the benefits on using it with MDS. Can anybody suggest a general guide? I've been finding a lot of broken links and cs-type of web pages rather than an end-user's guide. Also people's experience on adjusting the mtry param would be useful. Breiman says that it isn't too sensitive but I'm curious if anybody has had a different experience with it. Thanks in advance and apologies if this is too general. Concerned about your privacy? Follow this link to get FREE encrypted email: https://www.hushmail.com/?l=2 Big $$$ to be made with the HushMail Affiliate Program: https://www.hushmail.com/about.php?subloc=affiliate&l=427
>>>>> "Anonymous" == <chumpmonkey at hushmail.com> >>>>> on Sat, 12 Apr 2003 14:41:00 -0700 writes:Anonymous> Greetings. I'm trying to determine whether to use Anonymous> rpart or randomForest for a classification Anonymous> tree. Has anybody tested efficacy formally? I've Anonymous> run both and the confusion matrix for rf beats Anonymous> rpart. I've looking at the rf help page and am Anonymous> unable to figure out how to extract the tree. Anonymous> But more than that I'm looking for a more Anonymous> comprehensive user's guide for randomForest Anonymous> including the benefits on using it with MDS. Can Anonymous> anybody suggest a general guide? I've been Anonymous> finding a lot of broken links and cs-type of web Anonymous> pages rather than an end-user's guide. Also Anonymous> people's experience on adjusting the mtry param Anonymous> would be useful. Breiman says that it isn't too Anonymous> sensitive but I'm curious if anybody has had a Anonymous> different experience with it. Thanks in advance Anonymous> and apologies if this is too general. If you really read Breiman, or alternatively, remember English, you'll know that a forest has many trees... Regards, Martin Maechler <maechler at stat.math.ethz.ch> http://stat.ethz.ch/~maechler/
I think you are misunderstanding what randomForest does. It is not an optimizer that spits the "best" tree back at you. It grows a forest of trees (as many as you tell it to but 500 is the default). I would stick to rpart if you are having trouble wrapping your head around randomForest. Tree models are being used in many fields now and you should be able to find an applied guide in you field with a little effort. Good luck, Andy
One of these days I promise to write a package vignette... As Martin said, RF uses many trees (500 by default). The "forest" component of the randomForest object contains all the trees, but not in a easily readable form (because I don't see much use in "looking" at the trees except for debugging purposes). If you really want to see what a tree look like, grow just one tree and look at the "forest" component. Here are some explanation: For each tree: o "nrnodes" is the maxinum number of nodes a tree can have. o "ndbigtree" is a vector of length ntree containing the total number of nodes in the trees. o "nodestatus" is a nrnodes by ntree matrix of indicators: -1 if the node is terminal. o "treemap" a 3-D array, containing a two-column matrix for each tree. The first column indicate which node is the "left decendent" and the second column the "right decendent". Both are 0 if the node is terminal. o "bestvar" is a nrnodes by ntree matrix that indicate, for each node, which variable is used to split that node. 0 for terminal nodes. o "xbestsplit" is the same as "bestvar", except it tells where to split. One thing people should keep in mind about the "predicted" component of the randomForest object (and the confusion matrix for the training data), as well as "predict(rf.object)" without giving the newdata for prediction: That prediction is based on Out-of-Bag samples, so is *NOT* the same as usual prediction on training data. It is closer to the out-of-sample prediction as in, e.g., cross-validation. AFAIK there are only empirical and anecdotal evidence on sensitivity of performance to value of mtry. I can say that in my own experience, fiddling with mtry will only give at best marginal improvement. One easy way to answer the question for your situation is to try it yourself and see. With MDS on proximity matrix, you probably need to be a bit careful in its interpretation. The proximity matrix of the training data is computed on the *entire* training data, rather than just the out of bag portion. Thus the MDS plot will quite often show the different classes that look more "separable" than they really are. (We are thinking about a fix. Breiman pointed out that the difficulty is that if the proximity matrix is calculated only on the out-of-bag data, than 1-proximity is no longer positive definite). HTH, Andy> -----Original Message----- > From: chumpmonkey at hushmail.com [mailto:chumpmonkey at hushmail.com] > Sent: Saturday, April 12, 2003 5:41 PM > To: r-help at stat.math.ethz.ch > Subject: [R] rpart vs. randomForest > > > > Greetings. I'm trying to determine whether to use rpart or > randomForest > for a classification tree. Has anybody tested efficacy formally? I've > run both and the confusion matrix for rf beats rpart. I've looking at > the rf help page and am unable to figure out how to extract the tree. > But more than that I'm looking for a more comprehensive user's guide > for randomForest including the benefits on using it with MDS. > Can anybody > suggest a general guide? I've been finding a lot of broken links and > cs-type of web pages rather than an end-user's guide. Also > people's experience > on adjusting the mtry param would be useful. Breiman says > that it isn't > too sensitive but I'm curious if anybody has had a different > experience > with it. Thanks in advance and apologies if this is too general. > > > > Concerned about your privacy? Follow this link to get > FREE encrypted email: >> > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >------------------------------------------------------------------------------
I can echo that in data I've worked with (separate from the data Andy Liaw has worked with), fiddling with mtry doesn't make a whole lot of difference. To the extent it makes any difference at all, the default value tends to be near the optimum. Matt Wiener ------------------------------------------------------------------------------
I just saw in the prelimenary program for JSM '03, there will be (at least) 5 talks on random forest (one from our group), two of which will address the issue of tuning mtry, judging form the abstracts. If I may do a bit of advertising: I was asked to organized a roundtable luncheon at the JSM on multiple trees. I'd welcome anyone interested in this area to come. Cheers, Andy> -----Original Message----- > From: Wiener, Matthew [mailto:matthew_wiener at merck.com] > Sent: Tuesday, April 15, 2003 10:00 AM > To: r-help at stat.math.ethz.ch > Subject: RE: [R] rpart vs. randomForest > > > I can echo that in data I've worked with (separate from the > data Andy Liaw > has worked with), fiddling with mtry doesn't make a whole lot > of difference. > To the extent it makes any difference at all, the default > value tends to be > near the optimum. > > Matt Wiener > > > -------------------------------------------------------------- > ---------------- > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > -------------------------------------------------------------- > ---------------- > Notice: This e-mail message, together with any attachments, > contains information of Merck & Co., Inc. (Whitehouse > Station, New Jersey, USA) that may be confidential, > proprietary copyrighted and/or legally privileged, and is > intended solely for the use of the individual or entity named > in this message. If you are not the intended recipient, and > have received this message in error, please immediately > return this by e-mail and then delete it. > > =============================================================> ===============>------------------------------------------------------------------------------