thr3ads.net - R help - [R] Classification tree with a random variable [Aug 2006]

If this information is useful, please help other people find it:
Share via:

Amy Koch

2006-Aug-24 02:00 UTC

[R] Classification tree with a random variable

Hi,

I am planning on using classification trees to build a predictive model for data
which includes a random variable. I intend to use the R functions
'rpart' (and potentially also 'randomForest' and
'bagging').

I have a data set with 390 data points. The response variable is binary. There
are a large number of variables (>20, both categorical and continuous). The
random variable is 'site', which is the site number at which the data
was collected. There are 36 sites (with 6-12 data points per site).

My understanding of incorporating a random variable into a classification tree
is that each 'group' of the random variable should be removed
step-by-step and used to test the model in the cross-validation process. My
first question is, is this correct? If so, is it appropriate for my data set
given that for some sites this will remove less than 2% of the data?

My second question (assuming a positive response to the first), regards how this
is achieved in R. The only way I can figure how to do this is to put the
variable 'site' in as the 'xval' value. I have given an example
of how I have done this below in a simplified version of the model. Is how I
have done this correct?

hp1<-rpart(formula=hollowpres~dbh + lat + long +alt,
data=test,method="class",control=rpart.control
(maxcompete=4,xval=site), na.action=na.rpart)

Thanks
Amy
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more seemingly similar threads

R help - Aug 2006 - Classification tree with a random variable

[R] Classification tree with a random variable

Reasonably Related Threads

Wisdom of the Ancients