Hi, I am planning on using classification trees to build a predictive model for data which includes a random variable. I intend to use the R functions 'rpart' (and potentially also 'randomForest' and 'bagging'). I have a data set with 390 data points. The response variable is binary. There are a large number of variables (>20, both categorical and continuous). The random variable is 'site', which is the site number at which the data was collected. There are 36 sites (with 6-12 data points per site). My understanding of incorporating a random variable into a classification tree is that each 'group' of the random variable should be removed step-by-step and used to test the model in the cross-validation process. My first question is, is this correct? If so, is it appropriate for my data set given that for some sites this will remove less than 2% of the data? My second question (assuming a positive response to the first), regards how this is achieved in R. The only way I can figure how to do this is to put the variable 'site' in as the 'xval' value. I have given an example of how I have done this below in a simplified version of the model. Is how I have done this correct? hp1<-rpart(formula=hollowpres~dbh + lat + long +alt, data=test,method="class",control=rpart.control (maxcompete=4,xval=site), na.action=na.rpart) Thanks Amy [[alternative HTML version deleted]]