Hi Listers
I need to calculate and then plot a frequency histogram of the best tree
calculated using the 1-se rule. I have included some code that has worked
well for me in the past but it was only for selecting the minimum
cross-validation error. I include the code for my model, some relevant
output and the code for selecting and plotting the frequency histogram of
minimum xerror.
Here is the output that is being referenced in the code below
Regression tree:
rpart(formula = chbiomsq ~ HC + BC + POC + RUG + Depth + Exp +
DFP + FI + LAT, data = ch, method = "anova", control
rpart.control(minsplit = 10,
cp = 0.01, xval = 10))
Variables actually used in tree construction:
[1] BC Depth DFP Exp
Root node error: 47456/99 = 479.35
n= 99
CP nsplit rel error xerror xstd
1 0.344626 0 1.00000 1.02074 0.139585
2 0.179054 1 0.65537 0.76522 0.107470
3 0.072037 2 0.47632 0.68115 0.092627
4 0.063469 3 0.40428 0.67320 0.094830
5 0.036190 4 0.34081 0.58516 0.096726
6 0.034677 5 0.30462 0.56747 0.074953
7 0.018219 6 0.26995 0.52036 0.069447
8 0.017033 7 0.25173 0.54695 0.074249
9 0.010672 8 0.23469 0.55741 0.075119
10 0.010000 9 0.22402 0.55756 0.074017
#Here is the code to run the model 50 times and take the minimum xerror
each time, followed by the histogram. However to calculate the appropriate
tree size which satisfies the 1-se rule I need to select "minimum
xerror" <
xerror < "(minimum xerror + xstd error)" and which also has the
smallest
"nsplit".
> cp50 <- replicate(50,{ fit1 <-
rpart(chbiomsq~HC+BC+POC+RUG+Depth+Exp+DFP+FI+LAT,
data=ch,method="anova",
control=rpart.control(minsplit=10,cp=0.01,xval=10));x=printcp(fit1);x[which.min(x[,"xerror"]),"nsplit"]})
> hist(cp50,main="optimal splits for tree",xlab= "no. of
optimal tree
splits", ylab= "frequency")
Any help appreciated.
Andy
--
Andrew Halford Ph.D
Adjunct Research Scientist
University of Guam & Curtin University
Ph: +61 (0) 468 419 473
[[alternative HTML version deleted]]