Li Li <lilycai2007 <at> gmail.com> writes:
>
> Hi,
>
> I am using some classifiers in RWeka packages and met a couple problems.
>
> (1) J48 implements C45 classifier, the C45 should be able to handle missing
> values in both training set and test set. But I found the J48
> classifier can
> not be evaluated on test set with missing values--it just ignore them.
Why don't you ask this question on the WEKA mailing list at, for instance,
http://news.gmane.org/gmane.comp.ai.weka !
If I remember correctly, C4.5 is smart enough to simply drop examples with
missing values, while C5.0 will handle them more intelligently. It will also
address numerical attributes more sensible than C4.5 or CART.
Unfortunately, C5.0 is commercial software, but you can get a 2-weeks demo from
Quinlan's site.
> (2) The ensemble classifiers in RWeka such as bagging and boosting: there
> is a control argument as "W" to describe which base
classifier should
> be used.
> I use "W=J48" to boost C45 tree, but I am not sure how to
down size the
> tree
> to be a "weak" learner. Based on what I observed, the default
boosted
> J48 tree
> gets worse performance.
This is difficult to answer without any concrete data. From my own experince I
can say that in many cases results have distinctively improved when applying
Adaboost.
>
> Thanks for any discussion and help,
>
> Li
>
Regards, Hans Werner Borchers