Hi, I am using rpart as a part of my masters' project. I am trying to print out the resulting model using plot() function along with text() function. I am having difficulties with labels being cut-off. In text() function, I am using use.n=T option to get the number of people in each nodes but the on the lower and left part of the plot, the numbers get cut off. Thanks! Linus [[alternative HTML version deleted]]
Some answers are on the help pages for plot.rpart and text.rpart. On Mon, 12 May 2008, Linus An wrote:> Hi, > > I am using rpart as a part of my masters' project. I am trying to print out > the resulting model using plot() function along with text() function. I am > having difficulties with labels being cut-off. In text() function, I am > using use.n=T option to get the number of people in each nodes but the on > the lower and left part of the plot, the numbers get cut off. Thanks! > > Linus > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.PLEASE do. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Hi all, I am using RPART for my genetic study under ANOVA method. I wanted to know if it is possible to see r-squared or the amount of the variance in the data explained by a model (or a tree in this case from the RPART package. I am guessing that there has to be one since I am using ANOVA to estimate the tree but have not any luck with my endeavor. Thanks! Linus An Biostatistics Division Washington University in St. Louis School of Medicine [[alternative HTML version deleted]]
When using anova method, all of the printed results are scaled by the RSS for
the top node. Therefore the relative error measures for the trees already are
1-R^2.
tfit <- rpart(time ~ ., lung)
summary(tfit)
CP nsplit rel error xerror xstd
1 0.03665178 0 1.0000000 1.010097 0.1136942
2 0.03310179 1 0.9633482 1.079216 0.1172675
3 0.03029365 2 0.9302464 1.109587 0.1173583
4 0.01963453 3 0.8999528 1.249586 0.1327888
5 0.01627146 11 0.7396726 1.238411 0.1310952
6 0.01507635 12 0.7234012 1.260919 0.1337384
7 0.01031566 13 0.7083248 1.282740 0.1399397
8 0.01000000 14 0.6980091 1.296213 0.1396711
Node number 1: 228 observations, complexity param=0.03665178
mean=305.2325, MSE=44176.93
left son=2 (81 obs) right son=3 (147 obs)
Primary splits:
pat.karno < 75 to the left, improve=0.03661157, (3 missing)
ph.ecog < 1.5 to the right, improve=0.03620793, (1 missing)
status < 1.5 to the right, improve=0.02930372, (0 missing)
ph.karno < 85 to the left, improve=0.02058114, (1 missing)
sex < 1.5 to the left, improve=0.01679999, (0 missing)
Surrogate splits:
ph.ecog < 1.5 to the right, agree=0.787, adj=0.392, (3 split)
ph.karno < 75 to the left, agree=0.751, adj=0.291, (0 split)
age < 72.5 to the right, agree=0.680, adj=0.089, (0 split)
Node number 2: 81 observations, complexity param=0.03310179
mean=251.0247, MSE=34100.99
left son=4 (59 obs) right son=5 (22 obs)
Primary splits:
wt.loss < 21 to the left, improve=0.12735970, (7 missing)
status < 1.5 to the right, improve=0.08060663, (0 missing)
age < 68.5 to the right, improve=0.04906869, (0 missing)
inst < 2.5 to the left, improve=0.04148716, (0 missing)
sex < 1.5 to the left, improve=0.02401074, (0 missing)
Surrogate splits:
ph.karno < 55 to the right, agree=0.743, adj=0.095, (6 split)
etc,
The first split has R^2 = .0367 = 1-overall fit (top few lines) = the
improvement measure for the node.
The second split has R^2 = .127 for the obs within that node, it improve the
R^2 for the model as a whole by .033.
Terry T.