On Thu, 17 Feb 2011, Andrew Ziem wrote:
> After ctree builds a tree, how would I determine the direction missing
values follow by examining the BinaryTree-class object? For instance in the
example below Bare.nuclei has 16 missing values and is used for the first split,
but the missing values are not listed in either set of factors. (I have the
same question for missing values among numeric [non-factor] values, but I assume
the answer is similar.)
Hi Andrew,
ctree() doesn't treat missings in factors as a category in its own right.
Instead, it uses surrogate splits to determine the daughter node
observations with missings in the primary split variable are send to (you
need to specify `maxsurrogates' in ctree_control()).
However, you can recode your factor and add NA to the levels. This will
lead to the intended behaviour.
Best,
Torsten
>
>
>> require(party)
>> require(mlbench)
>> data(BreastCancer)
>> BreastCancer$Id <- NULL
>> ct <- ctree(Class ~ . , data=BreastCancer, controls =
ctree_control(maxdepth = 1))
>> ct
>
> Conditional inference tree with 2 terminal nodes
>
> Response: Class
> Inputs: Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size,
Bare.nuclei, Bl.cromatin, Normal.nucleoli, Mitoses
> Number of observations: 699
>
> 1) Bare.nuclei == {1, 2}; criterion = 1, statistic = 488.294
> 2)* weights = 448
> 1) Bare.nuclei == {3, 4, 5, 6, 7, 8, 9, 10}
> 3)* weights = 251
>> sum(is.na(BreastCancer$Bare.nuclei))
> [1] 16
>> nodes(ct, 1)[[1]]$psplit
> Bare.nuclei == {1, 2}
>> nodes(ct, 1)[[1]]$ssplit
> list()
>
>
>
> Based on below, the answer is node 2, but I don't see it in the object.
>
>> sum(BreastCancer$Bare.nuclei %in% c(1,2,NA))
> [1] 448
>> sum(BreastCancer$Bare.nuclei %in% c(1,2))
> [1] 432
>> sum(BreastCancer$Bare.nuclei %in% c(3:10))
> [1] 251
>
>
> Andrew
>
>