Hi, I am trying to write my own split function for rpart. The aim is to do, instead of anova, a linear regression to determine the split (minimize some criterion like sum of rss left and right of the split). The regression (lm) should simply use the dependent and independent variables passed to rpart. I am aware of the example provided in the rpart source code, but stumbled on similar problems that I saw reported on this list (no final solution posted, as far as I could see). The problem is, broadly speaking, that I do not see a way to access the full set of x and y variables in the user-written split-function. I must admit that I am totally new to R-programming (this is, so far, a one-time excursion into R), so would appreciate concrete advise (actual code would be much appreciated, but I don't want to ask too much). Best regards, Bart Verspagen Maastricht University, Department of Economics & UNU-MERIT [[alternative HTML version deleted]]
--- begin included message --- Hi, I am trying to write my own split function for rpart. The aim is to do, instead of anova, a linear regression to determine the split (minimize some criterion like sum of rss left and right of the split). The regression (lm) should simply use the dependent and independent variables passed to rpart. I am aware of the example provided in the rpart source code, but stumbled on similar problems that I saw reported on this list (no final solution posted, as far as I could see). The problem is, broadly speaking, that I do not see a way to access the full set of x and y variables in the user-written split-function. ---- end inclusion ----------- The rpart routine provides the x variables to a user-written split function one at a time. Since the entire structure of rpart -- printing, plotting, tree representation, etc --- is based on the premise of a single variable driving each split, what you are asking for would require an entirely different program. Terry Therneau