AFAIK rpart does not have built-in facility for adjusting bias in split
selection. One possibility is to define your own splitting criterion that
does the adjustment is some fashion. I believe the current version of rpart
allows you to define custom splitting criterion, but I have not tried it
myself.
Prof. Wei-yin Loh at UW-Madison (and his current and former students) had
worked on algorithms that compensate for bias in split selection. There are
software on his web page that you might want to check out.
HTH,
Andy
> From: lsjensen at micron.com
>
> Wondered about the best way to control for input variables that have a
> large number of levels in 'rpart' models. I understand the
algorithm
> searches through all possible splits (2^(k-1) for k levels) and so
> variables with more levels are more prone to be good
> spliters... so I'm
> looking for ways to compensate and adjust for this complexity.
>
> For example, if two variables produce comparable splits in
> the data but
> one contains 2 levels and the other 13 levels then I would
> like to have
> to have the algorithm choose the 'simpler' split.
>
> Is this best done with the 'cost' argument in the rpart options?
This
> defaults to one for all variables... so would it make sense to scale
> this by nlevels in each variable or sqrt(nlevels) or
> something similar?
>
> Thanks,
> Landon
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments,...{{dropped}}