Displaying 1 result from an estimated 1 matches for "obs_l".
Did you mean:
obs_0
2011 Jun 21
0
How does rpart computes "improve" for split="information"?? (which seems to be different then the "gini" case)
...impurity is still a
mystery for me.
Might you help with explaining it?
Bellow is some R code simply showing how the gini is computed (and how the
information is not as clear)
# creating data
set.seed(1324)
y <- sample(c(0,1), 20, T)
x <- y
x[1:5] <- 0
# manually making the first split
obs_L <- y[x<.5]
obs_R <- y[x>.5]
n_L <- sum(x<.5)
n_R <- sum(x>.5)
n <- length(x)
calc.impurity <- function(func = gini)
{
impurity_root <- func(prop.table(table(y)))
impurity_L <- func(prop.table(table(obs_L)))
impurity_R <-func(prop.table(table(obs_R)))
imp &l...