hhafner at statistik-hessen.de
2010-Apr-12 09:50 UTC
[R] rpart: Writing values of the leaves to a dateset
I'm fitting a regression tree with rpart and I want to write the values for every leaf in a dataset. As an example take the variable turnover. Let's suppose my tree for turnover has 30 leaves and I want to have 30 datasets with dataset 1 containing the turnover values of the units in leaf 1, dataset 2 containing turnover values for the observations in leaf 2 and so on. How can I do this? Best regards, Hans-Peter Hafner
-- begin inclusion -- I'm fitting a regression tree with rpart and I want to write the values for every leaf in a dataset. As an example take the variable turnover. Let's suppose my tree for turnover has 30 leaves and I want to have 30 datasets with dataset 1 containing the turnover values of the units in leaf 1, dataset 2 containing turnover values for the observations in leaf 2 and so on. How can I do this? -- end inclusion -- fit <- rpart(y ~ .......,data=mydata) parts <- tapply(mydata$y, predict(fit), c) Then parts will be a list with one element per branch of the tree, each containing the values of y found in that branch. An alternative is indices <- tapply(1:nrow(y), predict(fit), c) which will give a list containing row numbers. Terry T.