Tal Galili
2010-Dec-14 14:33 UTC
[R] rpart - how to estimate the “meaningful” predictors for an outcome (in classification trees)
Hi dear R-help memebers, When building a CART model (specifically classification tree) using rpart, it is sometimes obvious that there are variables (X's) that are meaningful for predicting some of the outcome (y) variables - while other predictors are relevant for other outcome variables (y's only). *How can it be estimated, which explanatory variable is "used" for which of the predicted value in the outcome variable?* Here is an example code in which x2 is the only important variable for predicting "b" (one of the y outcomes). There is no predicting variable for "c", and x1 is a predictor for "a", assuming that x2 permits it. How can this situation be shown using the an rpart fitted model? N <- 200 set.seed(5123) x1 <- runif(N) x2 <- runif(N) x3 <- runif(N) y <- sample(letters[1:3], N, T) y[x1 <.5] <- "a" y[x2 <.1] <- "b" fit <- rpart(y ~ x1+x2) fit2 <- prune(fit, cp= 0.07) plot(fit2) text(fit2, use.n=TRUE) Thanks, Tal ----------------Contact Details:------------------------------------------------------- Contact me: Tal.Galili@gmail.com | 972-52-7275845 Read me: talgalili.com (Hebrew) | biostatistics.co.il (Hebrew) | r-statistics.com (English) ---------------------------------------------------------------------------------------------- [[alternative HTML version deleted]]
Xiaogang Su
2010-Dec-14 23:03 UTC
[R] rpart - how to estimate the “meaningful” predictors for an outcome (in classification trees)
Hi, Tal, Here is a quick way of getting around. First create two responses via dummy variables y1 <- ifelse(y=="a", 1, 0) y2 <- ifelse(y=="b", 1, 0) and then built two separate tree models for y1 and y2 separately. Hope it helps. Xiaogang On Tue, Dec 14, 2010 at 8:33 AM, Tal Galili <tal.galili at gmail.com> wrote:> Hi dear R-help memebers, > > When building a CART model (specifically classification tree) using rpart, > it is sometimes obvious that there are variables (X's) that are meaningful > for predicting some of the outcome (y) variables - while other predictors > are relevant for other outcome variables (y's only). > > *How can it be estimated, which explanatory variable is "used" for which of > the predicted value in the outcome variable?* > > Here is an example code in which x2 is the only important variable for > predicting "b" (one of the y outcomes). There is no predicting variable for > "c", and x1 is a predictor for "a", assuming that x2 permits it. > > How can this situation be shown using the an rpart fitted model? > > N <- 200 > set.seed(5123) > > x1 <- runif(N) > > x2 <- runif(N) > > x3 <- runif(N) > > y <- sample(letters[1:3], N, T) > > y[x1 <.5] <- "a" > > y[x2 <.1] <- "b" > > fit <- rpart(y ~ x1+x2) > > fit2 <- prune(fit, cp= 0.07) > > plot(fit2) > > text(fit2, use.n=TRUE) > > Thanks, > > Tal > > > > ----------------Contact > Details:------------------------------------------------------- > Contact me: Tal.Galili at gmail.com | ?972-52-7275845 > Read me: talgalili.com (Hebrew) | biostatistics.co.il (Hebrew) | > r-statistics.com (English) > ---------------------------------------------------------------------------------------------- > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- =============================Xiaogang Su, Ph.D. Associate Professor, Statistician School of Nursing, University of Alabama Birmingham, AL 35294-1210 (205) 934-2355 [Office] xgsu at uab.edu xiaogangsu at gmail.com homepage.uab.edu/xgsu