On Sun, 8 Jul 2007, Aur?lie Davranche wrote:
> Hi!
>
> Could you please explain the difference between "prior" and
"weight" in
> rpart? It seems to be the same. But in this case why including a weight
> option in the latest versions? For an unbalanced sampling what is the best
to
> use : weight, prior or the both together?
The 'weight' argument (sic) has been there for a decade, and is not the
same as the 'prior' param.
The help file (which you seem unfamiliar with) says
weights: optional case weights.
parms: optional parameters for the splitting function. Anova
splitting has no parameters. Poisson splitting has a single
parameter, the coefficient of variation of the prior
distribution on the rates. The default value is 1.
Exponential splitting has the same parameter as Poisson. For
classification splitting, the list can contain any of: the
vector of prior probabilities (component 'prior'), the loss
matrix (component 'loss') or the splitting index (component
'split'). The priors must be positive and sum to 1. The
loss matrix must have zeros on the diagonal and positive
off-diagonal elements. The splitting index can be 'gini' or
'information'. The default priors are proportional to the
data counts, the losses default to 1, and the split defaults
to 'gini'.
The rpart technical report at
http://mayoresearch.mayo.edu/mayo/research/biostat/upload/61.pdf
may help you understand this.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595