Dimitri Liakhovitski
2009-Apr-20 18:35 UTC
[R] Random Forests: Predictor importance for Regression Trees
Hello! I think I am relatively clear on how predictor importance (the first one) is calculated by Random Forests for a Classification tree: Importance of predictor P1 when the response variable is categorical: 1. For out-of-bag (oob) cases, randomly permute their values on predictor P1 and then put them down the tree 2. For a given tree, subtract the number of votes for the correct class in the predictor-P1-permuted oob dataset from the number of votes for the correct class in the untouched oob dataset: if P1 is important, this number will be large. 3. The average of this number over all trees in the forest is the raw importance score for predictor P1. I am wondering what step 2 above looks like if the response variable is continous and not categorical, in other words - for a Regression tree. Could you please correct if what I wrote below is wrong? Thank you very much! Importance of predictor P1 when the response variable is continous: 1. For out-of-bag (oob) cases, randomly permute their values on predictor P1 and then put them down the tree 2. For a given tree, calculate mean squared deviation of observed y minus predicted y for (a) the untouched oob dataset and for (b) the predictor-P1-permuted oob dataset. Subtract (a) from (b). 3. The average of this number over all trees in the forest is the raw importance score for predictor P1. -- Dimitri Liakhovitski MarketTools, Inc. Dimitri.Liakhovitski at markettools.com
Liaw, Andy
2009-Apr-21 12:08 UTC
[R] Random Forests: Predictor importance for Regression Trees
Yes, you've got it! Cheers, Andy From: Behalf Of Dimitri> > Hello! > > I think I am relatively clear on how predictor importance (the first > one) is calculated by Random Forests for a Classification tree: > > Importance of predictor P1 when the response variable is categorical: > > 1. For out-of-bag (oob) cases, randomly permute their values on > predictor P1 and then put them down the tree > 2. For a given tree, subtract the number of votes for the correct > class in the predictor-P1-permuted oob dataset from the number of > votes for the correct class in the untouched oob dataset: if P1 is > important, this number will be large. > 3. The average of this number over all trees in the forest is the raw > importance score for predictor P1. > > I am wondering what step 2 above looks like if the response variable > is continous and not categorical, in other words - for a Regression > tree. Could you please correct if what I wrote below is wrong? Thank > you very much! > > Importance of predictor P1 when the response variable is continous: > > 1. For out-of-bag (oob) cases, randomly permute their values on > predictor P1 and then put them down the tree > 2. For a given tree, calculate mean squared deviation of observed y > minus predicted y for (a) the untouched oob dataset and for (b) the > predictor-P1-permuted oob dataset. Subtract (a) from (b). > 3. The average of this number over all trees in the forest is the raw > importance score for predictor P1. > > -- > Dimitri Liakhovitski > MarketTools, Inc. > Dimitri.Liakhovitski at markettools.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:12}}
Apparently Analagous Threads
- Random Forests: Question about R^2
- Putting values and axis X labels on the charts based on allEffects
- Producing customized tickmarks when producing a graph using "curve"
- rbind data frames stored in a list
- Analogy for %in% for the whole columns (rather than individual values)