Tal Galili
2011-Jan-24 08:07 UTC
[R] How to measure/rank “variable importance” when using rpart?
Hello all, When building a CART model (specifically classification tree) using rpart, it is sometimes interesting to know what is the importance of the various variables introduced to the model. Thus, my question is: *What common measures exists for ranking/measuring variable importance of participating variables in a CART model? And how can this be computed using R (for example, when using the rpart package)* For example, here is some dummy code, created so you might show your solutions on it. This example is structured so that it is clear that variable x1 and x2 are "important" while (in some sense) x1 is more important then x2 (since x1 should apply to more cases, thus make more influence on the structure of the data, then x2). set.seed(31431) n <- 400 x1 <- rnorm(n) x2 <- rnorm(n) x3 <- rnorm(n) x4 <- rnorm(n) x5 <- rnorm(n) X <- data.frame(x1,x2,x3,x4,x5) y <- sample(letters[1:4], n, T) y <- ifelse(X[,2] < -1 , "b", y) y <- ifelse(X[,1] < 0 , "a", y) require(rpart) fit <- rpart(y~., X) plot(fit); text(fit) info.gain.rpart(fit) # your function - telling us on each variable how important it is (references are always welcomed) Thanks! Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
Liaw, Andy
2011-Jan-24 15:21 UTC
[R] How to measure/rank "variable importance" when using rpart?
Check out caret::varImp.rpart(). It's described in the original CART book. Andy From: Tal Galili> > Hello all, > > When building a CART model (specifically classification tree) > using rpart, > it is sometimes interesting to know what is the importance of > the various > variables introduced to the model. > > Thus, my question is: *What common measures exists for > ranking/measuring > variable importance of participating variables in a CART > model? And how can > this be computed using R (for example, when using the rpart package)* > > For example, here is some dummy code, created so you might show your > solutions on it. This example is structured so that it is clear that > variable x1 and x2 are "important" while (in some sense) x1 is more > important then x2 (since x1 should apply to more cases, thus make more > influence on the structure of the data, then x2). > > set.seed(31431) > > n <- 400 > > x1 <- rnorm(n) > > x2 <- rnorm(n) > > x3 <- rnorm(n) > > x4 <- rnorm(n) > > x5 <- rnorm(n) > > X <- data.frame(x1,x2,x3,x4,x5) > > y <- sample(letters[1:4], n, T) > > y <- ifelse(X[,2] < -1 , "b", y) > > y <- ifelse(X[,1] < 0 , "a", y) > > require(rpart) > > fit <- rpart(y~., X) > > plot(fit); text(fit) > > info.gain.rpart(fit) # your function - telling us on each variable how > important it is > > (references are always welcomed) > > > Thanks! > > Tal > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | 972-52-7275845 > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il > (Hebrew) | > www.r-statistics.com (English) > -------------------------------------------------------------- > -------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:11}}
Possibly Parallel Threads
- How to measure/rank ?variable importance when using rpart?
- Is there an equivalence of lm's “anova” for an rpart object ?
- rpart - how to estimate the “meaningful” predictors for an outcome (in classification trees)
- Rpart decision tree
- Extracting the terms from an rpart object