thr3ads.net - R help - [R] regression tree xerror [Mar 2005]

If this information is useful, please help other people find it:
Share via:

Sherri Miller

2005-Mar-29 17:10 UTC

[R] regression tree xerror

I am running some models (for the first time) using rpart and am getting
results I don't know how to interpret. I'm using cross-validation to
prune
the tree and the results look like:
Root node error: 172.71/292 = 0.59148

n= 292

         CP nsplit rel error  xerror     xstd
1  0.124662      0   1.00000 1.00731 0.093701
2  0.064634      1   0.87534 1.08076 0.092337
3  0.057300      2   0.81070 1.07684 0.095582
4  0.038462      4   0.69610 0.99104 0.091659
5  0.036200      5   0.65764 1.01596 0.094635
6  0.029228      7   0.58524 1.00058 0.095440
7  0.028779      8   0.55601 1.00704 0.093242
8  0.024192      9   0.52724 0.97844 0.088936
9  0.018038     11   0.47885 1.02749 0.092263
10 0.016867     13   0.44278 1.08704 0.092112
11 0.015465     14   0.42591 1.10805 0.097813
12 0.015000     15   0.41044 1.11130 0.097881

I do not understand why the rel error rate is going down, but the xerror
generally goes up. For some of the runs, the xerror never goes down. Is
result caused by something in my data structure? I have run some example
datasets from the various help manuals and the xerror goes down, as one
would expect. Any suggestions?

Sherri

Sherri L. Miller
Wildlife Biologist
Redwood Sciences Laboratory
707.825.2949
707.825.2901 (FAX)

Luis Torgo

2005-Mar-29 18:13 UTC

head link

[R] regression tree xerror

Sherri Miller wrote:
>I am running some models (for the first time) using rpart and am getting
>results I don't know how to interpret. I'm using cross-validation to
prune
>the tree and the results look like:
>Root node error: 172.71/292 = 0.59148
>
>n= 292
>
>         CP nsplit rel error  xerror     xstd
>1  0.124662      0   1.00000 1.00731 0.093701
>2  0.064634      1   0.87534 1.08076 0.092337
>3  0.057300      2   0.81070 1.07684 0.095582
>4  0.038462      4   0.69610 0.99104 0.091659
>5  0.036200      5   0.65764 1.01596 0.094635
>6  0.029228      7   0.58524 1.00058 0.095440
>7  0.028779      8   0.55601 1.00704 0.093242
>8  0.024192      9   0.52724 0.97844 0.088936
>9  0.018038     11   0.47885 1.02749 0.092263
>10 0.016867     13   0.44278 1.08704 0.092112
>11 0.015465     14   0.42591 1.10805 0.097813
>12 0.015000     15   0.41044 1.11130 0.097881
>
>I do not understand why the rel error rate is going down, but the xerror
>generally goes up. For some of the runs, the xerror never goes down. Is
>result caused by something in my data structure? I have run some example
>datasets from the various help manuals and the xerror goes down, as one
>would expect. Any suggestions?
>
>Sherri
>
>Sherri L. Miller
>Wildlife Biologist
>Redwood Sciences Laboratory
>707.825.2949
>707.825.2901 (FAX)
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>  
>rel error is estimated with the training data (the sample used for 
obtaining the tree) and thus it decreases as the tree increases, because 
the tree becomes more and more adjusted to the data. This apparently 
better performance should not be taken for "real" when predicting for
a
new sample of data because larger trees do tend to overfit the traning 
sample and will hardly generalise well on new fresh data samples.

That's the motivation for the xerror (and xstd) estimates. These are 
more realistic estimates of the performance of the tree on new samples 
of data. They are obtained by the rpart function by an internal cross 
validation process. The function prune() can be used to select a subtree 
of the tree obtained with rpart() if you think (by looking at the xerror 
estimates) you would be better off with this subtree.

Hope this helps.

Luis Torgo

-- 
Luis Torgo
    FEP/LIACC, University of Porto   Phone : (+351) 22 339 20 93
    Machine Learning Group           Fax   : (+351) 22 339 20 99
    R. de Ceuta, 118, 6o             email : ltorgo at liacc.up.pt
    4050-190 PORTO - PORTUGAL        WWW   : http://www.liacc.up.pt/~ltorgo

Maybe Matching Threads

how is xerror calculated in rpart?

R help - Mar 2005 - regression tree xerror

[R] regression tree xerror

[R] regression tree xerror

Maybe Matching Threads