On Tue, 21 Jan 2003, Doug Kitch wrote:
> Hello. I am not sure if you can help me or not but I have a dataset with
> N ~ 4000 with binary response and p ~ 0.08, regardless of how many or
> how few variables I offer I get the following message: 'Error in
> rpart(formula, method="class"): No splits could be created
Dumped.' If I
> run tree with the same dataset (no missing data) in S I get results. Is
> there a problem with large datasets in rpart?
If there were it would not be relevant: 4000 is not close to `large'.
I suspect you ought to be using losses with such a skewed binary
response, and am not surprised that no single split is effective.
?rpart.control should help you.
> Also, do you happen to know the parameter options which
> will make rpart and tree act the same. I am wondering if
> this is possible since I have no missing data.
It's not exactly possible, but look in MASS4 for some comparisons.
Given that tree in S does not do what it is documented to do, it would be
hard to reproduce, but tree in R comes pretty close to tree in S's
documented behaviour.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595