Gabor Grothendieck
2004-Jun-24 14:08 UTC
[R] tree model with at most one split point per variable
I would like to create a tree model with at most one split point per variable using tree, rpart or other routine. Its OK if a variable enters at more than one node but if it does then all splits for that variable should be at the same point. The idea is that I want to be able to summarize the data as binary factors with the chosen split points. I don't want to have three level or more factors. For example, the following shows that the first split is with Petal.Length splitting at 2.45; however, there are other splits of Petal.Length at 4.95. I want to disallow that. R> data(iris) R> tree(Species ~., data = iris) node), split, n, deviance, yval, (yprob) * denotes terminal node 1) root 150 329.600 setosa ( 0.33333 0.33333 0.33333 ) 2) Petal.Length < 2.45 50 0.000 setosa ( 1.00000 0.00000 0.00000 ) * 3) Petal.Length > 2.45 100 138.600 versicolor ( 0.00000 0.50000 0.50000 ) 6) Petal.Width < 1.75 54 33.320 versicolor ( 0.00000 0.90741 0.09259 ) 12) Petal.Length < 4.95 48 9.721 versicolor ( 0.00000 0.97917 0.02083 ) 24) Sepal.Length < 5.15 5 5.004 versicolor ( 0.00000 0.80000 0.20000 ) * 25) Sepal.Length > 5.15 43 0.000 versicolor ( 0.00000 1.00000 0.00000 ) * 13) Petal.Length > 4.95 6 7.638 virginica ( 0.00000 0.33333 0.66667 ) * 7) Petal.Width > 1.75 46 9.635 virginica ( 0.00000 0.02174 0.97826 ) 14) Petal.Length < 4.95 6 5.407 virginica ( 0.00000 0.16667 0.83333 ) * 15) Petal.Length > 4.95 40 0.000 virginica ( 0.00000 0.00000 1.00000 ) *