Terry Therneau
2009-May-13 12:26 UTC
[R] questions on rpart (tree changes when rearrange the order of covariates)
If two variables have exactly the same split importance, then rpart will use the one that was first in the model statement. So if rpart(group ~ age + height + weight + sex) and at some split point both age and weight gave a split with 20 correct and 9 incorrect, then age would be used to split at that node. Even though the error of the age and weight splits are the same, the set of 9 subjects that were incorrect may be different, i.e., they don't send exactly the same observations to the left and the right. Thus, the rest of the tree from that point on may be different, giving a different fit. For continuous y this rarely happens -- that two splits have exactly the same R^2 -- but it is not uncommon in classification problems. Terry Therneau