Dear All I would like to use rpart to obtain a regression tree for a dataset like the following: Y X1 X2 X3 X4 5.500033 B A 3 2 0.35625148 D B 6 5 0.8062546 E C 4 3 5.100014 C A 3 2 5.7000422 A A 3 2 0.76875436 C A 6 5 1.0312537 D A 4 1 Y is the objective variable. X1, X2, X3 and X4 can take, respectively, the following values: X1: A,B,C,D,E X2: A,B,C,D,E X3: 3,4,5,6 X4. 1,2,3,4,5 Should I convert X3 and X4 to factor before running rpart? Thanks in advance, Paul
On Mon, 22 Jan 2007, Paul Smith wrote:> Dear All > > I would like to use rpart to obtain a regression tree for a dataset > like the following: > > Y X1 X2 X3 X4 > 5.500033 B A 3 2 > 0.35625148 D B 6 5 > 0.8062546 E C 4 3 > 5.100014 C A 3 2 > 5.7000422 A A 3 2 > 0.76875436 C A 6 5 > 1.0312537 D A 4 1 > > Y is the objective variable. X1, X2, X3 and X4 can take, respectively, > the following values: > > X1: A,B,C,D,E > X2: A,B,C,D,E > X3: 3,4,5,6 > X4. 1,2,3,4,5 > > Should I convert X3 and X4 to factor before running rpart?If they really are factors, yes. If they are ordered factors, no. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 1/23/07, Prof Brian Ripley <ripley at stats.ox.ac.uk> wrote:> > I would like to use rpart to obtain a regression tree for a dataset > > like the following: > > > > Y X1 X2 X3 X4 > > 5.500033 B A 3 2 > > 0.35625148 D B 6 5 > > 0.8062546 E C 4 3 > > 5.100014 C A 3 2 > > 5.7000422 A A 3 2 > > 0.76875436 C A 6 5 > > 1.0312537 D A 4 1 > > > > Y is the objective variable. X1, X2, X3 and X4 can take, respectively, > > the following values: > > > > X1: A,B,C,D,E > > X2: A,B,C,D,E > > X3: 3,4,5,6 > > X4. 1,2,3,4,5 > > > > Should I convert X3 and X4 to factor before running rpart? > > If they really are factors, yes. > If they are ordered factors, no.Thanks, Prof. Ripley. Is it correct to adopt the same procedure in case of classification trees, i.e., in case the objective variable (Y) is categorical and X1, X2, X3 and X4 are as above? Paul