thr3ads.net - R help - [R] missing values in party::ctree [Feb 2011]

If this information is useful, please help other people find it:
Share via:

Andrew Ziem

2011-Feb-17 19:23 UTC

[R] missing values in party::ctree

After ctree builds a tree, how would I determine the direction missing values
follow by examining the BinaryTree-class object?  For instance in the example
below Bare.nuclei has 16 missing values and is used for the first split, but the
missing values are not listed in either set of factors.   (I have the same
question for missing values among numeric [non-factor] values, but I assume the
answer is similar.)

> require(party)
> require(mlbench)
> data(BreastCancer)
> BreastCancer$Id <- NULL
> ct <- ctree(Class ~ . , data=BreastCancer, controls =
ctree_control(maxdepth = 1))
> ct
         Conditional inference tree with 2 terminal nodes

Response:  Class 
Inputs:  Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size,
Bare.nuclei, Bl.cromatin, Normal.nucleoli, Mitoses
Number of observations:  699 

1) Bare.nuclei == {1, 2}; criterion = 1, statistic = 488.294
  2)*  weights = 448 
1) Bare.nuclei == {3, 4, 5, 6, 7, 8, 9, 10}
  3)*  weights = 251 > sum(is.na(BreastCancer$Bare.nuclei))
[1] 16> nodes(ct, 1)[[1]]$psplit
Bare.nuclei == {1, 2}> nodes(ct, 1)[[1]]$ssplitlist()



Based on below, the answer is node 2, but I don't see it in the object.
> sum(BreastCancer$Bare.nuclei %in% c(1,2,NA))
[1] 448> sum(BreastCancer$Bare.nuclei %in% c(1,2))
[1] 432> sum(BreastCancer$Bare.nuclei %in% c(3:10))[1] 251


Andrew

Torsten Hothorn

2011-Feb-18 08:07 UTC

head link

[R] missing values in party::ctree

On Thu, 17 Feb 2011, Andrew Ziem  wrote:
> After ctree builds a tree, how would I determine the direction missing
values follow by examining the BinaryTree-class object?  For instance in the
example below Bare.nuclei has 16 missing values and is used for the first split,
but the missing values are not listed in either set of factors.   (I have the
same question for missing values among numeric [non-factor] values, but I assume
the answer is similar.)
Hi Andrew,

ctree() doesn't treat missings in factors as a category in its own right. 
Instead, it uses surrogate splits to determine the daughter node 
observations with missings in the primary split variable are send to (you 
need to specify `maxsurrogates' in ctree_control()).

However, you can recode your factor and add NA to the levels. This will
lead to the intended behaviour.

Best,

Torsten
>
>
>> require(party)
>> require(mlbench)
>> data(BreastCancer)
>> BreastCancer$Id <- NULL
>> ct <- ctree(Class ~ . , data=BreastCancer, controls =
ctree_control(maxdepth = 1))
>> ct
>
>         Conditional inference tree with 2 terminal nodes
>
> Response:  Class
> Inputs:  Cl.thickness, Cell.size, Cell.shape, Marg.adhesion, Epith.c.size,
Bare.nuclei, Bl.cromatin, Normal.nucleoli, Mitoses
> Number of observations:  699
>
> 1) Bare.nuclei == {1, 2}; criterion = 1, statistic = 488.294
>  2)*  weights = 448
> 1) Bare.nuclei == {3, 4, 5, 6, 7, 8, 9, 10}
>  3)*  weights = 251
>> sum(is.na(BreastCancer$Bare.nuclei))
> [1] 16
>> nodes(ct, 1)[[1]]$psplit
> Bare.nuclei == {1, 2}
>> nodes(ct, 1)[[1]]$ssplit
> list()
>
>
>
> Based on below, the answer is node 2, but I don't see it in the object.
>
>> sum(BreastCancer$Bare.nuclei %in% c(1,2,NA))
> [1] 448
>> sum(BreastCancer$Bare.nuclei %in% c(1,2))
> [1] 432
>> sum(BreastCancer$Bare.nuclei %in% c(3:10))
> [1] 251
>
>
> Andrew
>
>

Reasonably Related Threads

Search for more possibly parallel threads

R help - Feb 2011 - missing values in party::ctree

[R] missing values in party::ctree

[R] missing values in party::ctree

Reasonably Related Threads