Hello,
I have written an user-defined split function for the package rpart, now I
want to prune the fitted tree with my own defined function.
To do so I want at first to grow a large tree with rpart and then use the
function to prune the tree.
The problem here is, growing a large tree with the user defined split
function and therefore setting the complexity parameter to 0 (cp=0), gives
me a smaller tree, as when I set the complexity parameter to 0.01 (default
value).
My question is now, which node values does the rpart use in order to
'prune' the tree? I first thought it is only the 'deviance'
value, which is
an output of the evaluation function, but I am not quite sure about that
anymore.
Example Output of 2 trees (same data and split functions, different cp) :
load('alist.R') # user defined split function
> fit1 <- rpart(time.discrete ~
x1+x2+x3+x4+x5+x6,datTrain,control=list(cp=0.01),
+ method=alist)
n= 3042
node), split, n, deviance, yval
* denotes terminal node
1) root 3042 3043.5170 2
2) x3=2,3 1036 1231.5710 1
4) x2=2,3,4,5 556 704.0214 1
8) x6< 23.5 118 126.3924 1 *
9) x6>=23.5 438 541.8522 1
18) x1=1 164 214.2196 1 *
19) x1=0 274 295.5250 2
38) x5< 30.5 81 102.2434 1 *
39) x5>=30.5 193 161.5116 3 *
5) x2=0,1 480 454.6036 3 *
3) x3=0,1 2006 1698.9710 3
6) x6< 23.5 596 660.9713 2 *
7) x6>=23.5 1410 978.9448 3
14) x5< 19.5 323 342.9626 2 *
15) x5>=19.5 1087 567.2176 4
30) x1=0 633 200.8669 5 *
31) x1=1 454 329.6705 3
62) x2=0,1,3 254 109.3010 4 *
63) x2=2,4,5 200 186.8194 3 *
> fit1$cptable
CP nsplit rel error
1 0.03712005 0 1.0000000
2 0.02396757 1 0.9628800
3 0.02099862 2 0.9389124
4 0.01205192 4 0.8969151
5 0.01175506 5 0.8848632
6 0.01102346 6 0.8731082
7 0.01054950 7 0.8620847
8 0.01043856 8 0.8515352
9 0.01000000 9 0.8410966
> fit2 <- rpart(time.discrete ~
x1+x2+x3+x4+x5+x6,datTrain,control=list(cp=0),
+ method=alist)
n= 3042
node), split, n, deviance, yval
* denotes terminal node
1) root 3042 3.043517e+03 2
2) x3=2,3 1036 1.231571e+03 1
4) x2=2,3,4,5 556 7.040214e+02 1
8) x6< 23.5 118 1.263924e+02 1
16) x5< 42.5 73 5.778729e+01 1
32) x1=1 31 4.888716e-10 1 *
33) x1=0 42 4.611536e+01 1 *
17) x5>=42.5 45 5.607449e+01 1 *
9) x6>=23.5 438 5.418522e+02 1 *
5) x2=0,1 480 4.546036e+02 3 *
3) x3=0,1 2006 1.698971e+03 3 *
> fit2$cptable
CP nsplit rel error
1 0.037120046 0 1.0000000
2 0.023967574 1 0.9628800
3 0.011755057 2 0.9389124
4 0.004117156 3 0.9271573
5 0.003835016 4 0.9230402
6 0.000000000 5 0.9192052
Thank you
Peter Mayer
[[alternative HTML version deleted]]