On Sun, 10 Sep 2006, Maciej Blizi?ski wrote:
> Hello all R-help list subscribers,
>
> I'd like to create a regression tree of a data set with binary response
> variable. Only 5% of observations are a success, so the regression tree
> will not find really any variable value combinations that will yield
> more than 50% of probability of success.
This would be a misuse of a regression tree, for the exact problem for
which classification trees were designed.
> I am however interested in areas where the probability of success is
> noticeably higher than 5%, for example 20%. I've tried rpart and the
> weights option, increasing the weights of the success-observations.
You are 'misleading' rpart by using 'weights', claiming to have
case
weights for cases you do not have. You need to use 'cost' instead.
This is a standard issue, discussed in all good books on classification
(including mine).
> It works as expected in terms of the tree creation: instead of a single
> root, a tree is being built. But the tree plot() and text() are somewhat
> misleading. I'm interested in the observation counts inside each leaf.
> I use the "use.n = TRUE" parameter. The counts displayed are
misleading,
> the numbers of successes are not the original numbers from the sample,
> they seem to be cloned success-observations.
They _are_ the original numbers, for that is what 'case weights' means.
> I'd like to split the tree just as weights parameter allows me to,
> keeping the original number of observations in the tree plot. Is it
> possible? If yes, how?
>
> Kind regards,
> Maciej
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595