thr3ads.net - R help - [R] How to read this Rpart decision tree [Feb 2015]

If this information is useful, please help other people find it:
Share via:

Therneau, Terry M., Ph.D.

2015-Feb-11 13:17 UTC

[R] How to read this Rpart decision tree

First:
    summary(ss.rpart1)
or summary(ss.rpart, file="whatever")

The printout will be quite long since your tree is so large, so the second form
may be
best followed by a perusal of the file with your favorite text editor.  The file
name of
"whatever" above should be something you choose, of course.   This
will give you a full
description of the tree.  Read the first node or two very carefully so that you
understand
what the fit did.
   Plotting routines for trees have to make display choices, since there simply
is not
enough space available to list all the details.  You have a complicated endpoint
with at
least 14 different products.  The predicted value for the each node of the tree
is a
vector of percentages (one per product, adds to one); plots often show only the
name of
the most frequent.  The alive/dead endpoint for the Titanic data is a lot easier
to fit
into a little plotting oval so of course the plotted tree is easier to grasp.

Terry T.

On 02/11/2015 05:00 AM, r-help-request at r-project.org
wrote:> Hi all,
>
> In the attachment or this link (http://oi58.tinypic.com/35ic9qc.jpg)
you'll find the decision tree I made. I used the Rpart package to make the
tree and the rattle package using the fancyRpartPlot to plot it. The data in the
tree looks different than about every example I have seen before. I don't
understand how I should read it. I want to predict Product (which are
productkeys). The variables to predict it contain age, incomegroup, gender,
totalchildren, education, occupation, houseownerflag, numberCars.It looks like
the upper number is a ProductKey. The "n" is number of observations?
And the percentage of the yes/no question below.
>
> This is the code I used.
>
>> >ss.rpart1 <- rpart(Product ~ ., data=sstrain,
control=rpart.control(minbucket=2,minsplit=1, cp=-1))
>> >spt <- which.min(ss.rpart1$cptable[, "xerror"])
>> >scp <- ss.rpart1$cptable[opt, "CP"]
>> >ss.rpart2 <- prune(ss.rpart1, cp=cp)
>> >fancyRpartPlot(ss.rpart2)
> So why does the tree looks so different from the most (for
example:http://media.tumblr.com/a9f482ff88b0b9cfaffca7ffd46c6a8e/tumblr_inline_mz7pyuaYJQ1s5wtly.png).
This is from Trevor Stephen's TItanic tutorial. The first node show that 62%
of 100% doesn't survive. If they were male, only 19% of them were survivors.
I find that a lot examples look like that. Why does mine predict per ProductKey
and every node it has something else. it doesn't make sense to me. And it
doesn't have the two numbers like .62 and .38 but it has n=197e+3. So should
I read the first node like "For 100% of the observations of ProductKey
1074, the incomegroup was moderate)"?
>
> Thank you!
>
> Kim

R help - Feb 2015 - How to read this Rpart decision tree

[R] How to read this Rpart decision tree