When using anova method, all of the printed results are scaled by the RSS for
the top node. Therefore the relative error measures for the trees already are
1-R^2.
tfit <- rpart(time ~ ., lung)
summary(tfit)
CP nsplit rel error xerror xstd
1 0.03665178 0 1.0000000 1.010097 0.1136942
2 0.03310179 1 0.9633482 1.079216 0.1172675
3 0.03029365 2 0.9302464 1.109587 0.1173583
4 0.01963453 3 0.8999528 1.249586 0.1327888
5 0.01627146 11 0.7396726 1.238411 0.1310952
6 0.01507635 12 0.7234012 1.260919 0.1337384
7 0.01031566 13 0.7083248 1.282740 0.1399397
8 0.01000000 14 0.6980091 1.296213 0.1396711
Node number 1: 228 observations, complexity param=0.03665178
mean=305.2325, MSE=44176.93
left son=2 (81 obs) right son=3 (147 obs)
Primary splits:
pat.karno < 75 to the left, improve=0.03661157, (3 missing)
ph.ecog < 1.5 to the right, improve=0.03620793, (1 missing)
status < 1.5 to the right, improve=0.02930372, (0 missing)
ph.karno < 85 to the left, improve=0.02058114, (1 missing)
sex < 1.5 to the left, improve=0.01679999, (0 missing)
Surrogate splits:
ph.ecog < 1.5 to the right, agree=0.787, adj=0.392, (3 split)
ph.karno < 75 to the left, agree=0.751, adj=0.291, (0 split)
age < 72.5 to the right, agree=0.680, adj=0.089, (0 split)
Node number 2: 81 observations, complexity param=0.03310179
mean=251.0247, MSE=34100.99
left son=4 (59 obs) right son=5 (22 obs)
Primary splits:
wt.loss < 21 to the left, improve=0.12735970, (7 missing)
status < 1.5 to the right, improve=0.08060663, (0 missing)
age < 68.5 to the right, improve=0.04906869, (0 missing)
inst < 2.5 to the left, improve=0.04148716, (0 missing)
sex < 1.5 to the left, improve=0.02401074, (0 missing)
Surrogate splits:
ph.karno < 55 to the right, agree=0.743, adj=0.095, (6 split)
etc,
The first split has R^2 = .0367 = 1-overall fit (top few lines) = the
improvement measure for the node.
The second split has R^2 = .127 for the obs within that node, it improve the
R^2 for the model as a whole by .033.
Terry T.