Hi all, What parameter do I normally change in the rpart function? How do I set the "cp" option? Is there a way to read off error rate directly from the "rpart" function for training data; I imagine for testing data I have to apply a "predict", but for training data I guess the error count would be somewhere existing once the "rpart" function is finished. Looks like it is related to expressions such as "expected loss=0.8362365" when using "summary" function. Now I have to do this manually, and when it came to compare the correct vs. wrong and count the errors, it was always very tedious... Thanks a lot! M. [[alternative HTML version deleted]]
I see! So you mean I have to collect error counts myself manually... By the way, what parameters do I normally change to improve the default rpart performance? Thanks a lot! On 3/8/06, Carlos Ortega <coforfe@gmail.com> wrote:> > Hello Michael, > > In some of the examples in the rpart function you will find that > comparison between the actual and the predicted values, although it is for > the "Classification" mode, not for the regression. > > Regards, > Carlos. > > > On 3/7/06, Michael <comtech.usa@gmail.com> wrote: > > > Hi all, > > > > > > > > What parameter do I normally change in the rpart function? How do I set > > the > > "cp" option? > > > > > > > > Is there a way to read off error rate directly from the "rpart" function > > for > > training data; I imagine for testing data I have to apply a "predict", > > but > > for training data I guess the error count would be somewhere existing > > once > > the "rpart" function is finished. Looks like it is related to > > expressions > > such as "expected loss=0.8362365" when using "summary" function. > > > > Now I have to do this manually, and when it came to compare the correct > > vs. > > wrong and count the errors, it was always very tedious... > > > > > > > > Thanks a lot! > > > > > > > > M. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > >[[alternative HTML version deleted]]
Yes, rpart.control() has a bunch of parameters... I don't know which one can mostly improve the classification performance. On 3/9/06, Carlos Ortega <coforfe@gmail.com> wrote:> > Hello, > > Yes, check rpart.control() for details. > > Regards, > Carlos. > > > On 3/9/06, Michael <comtech.usa@gmail.com> wrote: > > > > I see! So you mean I have to collect error counts myself manually... > > > > By the way, what parameters do I normally change to improve the default > > rpart performance? > > > > Thanks a lot! > > > > > > On 3/8/06, Carlos Ortega <coforfe@gmail.com> wrote: > > > > > > Hello Michael, > > > > > > In some of the examples in the rpart function you will find that > > > comparison between the actual and the predicted values, although it is for > > > the "Classification" mode, not for the regression. > > > > > > Regards, > > > Carlos. > > > > > > > > > On 3/7/06, Michael < comtech.usa@gmail.com > wrote: > > > > > > > Hi all, > > > > > > > > > > > > > > > > What parameter do I normally change in the rpart function? How do I > > > > set the > > > > "cp" option? > > > > > > > > > > > > > > > > Is there a way to read off error rate directly from the "rpart" > > > > function for > > > > training data; I imagine for testing data I have to apply a > > > > "predict", but > > > > for training data I guess the error count would be somewhere > > > > existing once > > > > the "rpart" function is finished. Looks like it is related to > > > > expressions > > > > such as "expected loss=0.8362365" when using "summary" function. > > > > > > > > Now I have to do this manually, and when it came to compare the > > > > correct vs. > > > > wrong and count the errors, it was always very tedious... > > > > > > > > > > > > > > > > Thanks a lot! > > > > > > > > > > > > > > > > M. > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > ______________________________________________ > > > > R-help@stat.math.ethz.ch mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide! > > > > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > > > > > > > > > > > > >[[alternative HTML version deleted]]
I believe it really would be more productive to understand what the parameters do than to tune them blindly by brute-force. Whether tuning any of the parameters would have impact on the error rate (I assume you're referring to some estimate of the test set error rate, as there's not much point in looking at training error rate) can also depend on the nature of your data. I believe the rpart package comes with a pdf file of a tech report by its original authors. It's worth reading. Andy From: Michael> > I've spent many hours on these parameters. > > I changed them one by one and exhaustively all the possible > combinations. > > To my surprise, only "cp" will affect the performance of the > classifier. > > Others e.g. "maxsplit", etc. does not affect error rate at all. > > I felt cheated by rpart.control(). > > > On 3/11/06, Michael <comtech.usa at gmail.com> wrote: > > > > Yes, rpart.control() has a bunch of parameters... > > I don't know which one can mostly improve the classification > > performance. > > > > > > On 3/9/06, Carlos Ortega <coforfe at gmail.com> wrote: > > > > > > Hello, > > > > > > Yes, check rpart.control() for details. > > > > > > Regards, > > > Carlos. > > > > > > > > > On 3/9/06, Michael <comtech.usa at gmail.com > wrote: > > > > > > > > I see! So you mean I have to collect error counts myself > > > > manually... > > > > > > > > By the way, what parameters do I normally change to improve the > > > > default rpart performance? > > > > > > > > Thanks a lot! > > > > > > > > > > > > On 3/8/06, Carlos Ortega <coforfe at gmail.com> wrote: > > > > > > > > > > Hello Michael, > > > > > > > > > > In some of the examples in the rpart function you > will find that > > > > > comparison between the actual and the predicted > values, although > > > > > it is for the "Classification" mode, not for the regression. > > > > > > > > > > Regards, > > > > > Carlos. > > > > > > > > > > > > > > > On 3/7/06, Michael < comtech.usa at gmail.com > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > > > > > > > What parameter do I normally change in the rpart > function? How > > > > > > do I set the "cp" option? > > > > > > > > > > > > > > > > > > > > > > > > Is there a way to read off error rate directly from the > > > > > > "rpart" function for training data; I imagine for > testing data > > > > > > I have to apply a "predict", but > > > > > > for training data I guess the error count would be somewhere > > > > > > existing once > > > > > > the "rpart" function is finished. Looks like it is > related to > > > > > > expressions > > > > > > such as "expected loss=0.8362365" when using > "summary" function. > > > > > > > > > > > > Now I have to do this manually, and when it came to compare > > > > > > the correct vs. wrong and count the errors, it was > always very > > > > > > tedious... > > > > > > > > > > > > > > > > > > > > > > > > Thanks a lot! > > > > > > > > > > > > > > > > > > > > > > > > M. > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > > > > > ______________________________________________ > > > > > > R-help at stat.math.ethz.ch mailing list > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > > PLEASE do read the posting guide! > > > > > > > http://www.R-project.org/posting-guide.html<http://www.r-proje > > > > > > ct.org/posting-guide.html> > > > > > > > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >