thr3ads.net - R help - [R] randomForest: predictor importance (for regressions) [May 2010]

If this information is useful, please help other people find it:
Share via:

Dimitri Liakhovitski

2010-May-05 17:51 UTC

[R] randomForest: predictor importance (for regressions)

I have a question about predictor importances in randomForest.

Once I've run randomForest and got my object, I get their importances:
rfresult$importance
I also get the "standard errors" of the permutation-based importance
measure: rfresult$importanceSD

I have 2 questions:

1. Because I am dealing with regressions, I am getting an importance object
(rfresult$importance) with two columns, labeled "%IncMSE" (the first
column)
and "IncNodePurity" (the second column). I assume it's the first
one that is
the mean decrease in accuracy due to permutation. Am I correct or am I
wrong? I am confused because ?randomForest says: "or Regression, the first
column is the mean decrease in accuracy and the second the mean decrease in
MSE." - but it is the first column, not the second that has "MSE"
in its
header.

2. According to this thread (
http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg94873.html), The
overall importance measure is mean(d[i]) / se(d[i]), where se(d[i]) is
sd(d[i])/sqrt(ntree) (the "standard error").
So, in order to get at the importance of predictors (and I want to use the
permutation-based importance) - should I just take the first column of
rfresult$importance or should I first divide rfresult$importance by
rfresult$importanceSD - to get something analogous to z-scores and use
those?

Thank you very much!

-- 
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski@ninah.com

	[[alternative HTML version deleted]]

Liaw, Andy

2010-May-06 12:37 UTC

head link

[R] randomForest: predictor importance (for regressions)

See reply inline below. 

Andy

From: Dimitri Liakhovitski> 
> I have a question about predictor importances in randomForest.
> 
> Once I've run randomForest and got my object, I get their importances:
> rfresult$importance
> I also get the "standard errors" of the permutation-based
importance
> measure: rfresult$importanceSD
> 
> I have 2 questions:
> 
> 1. Because I am dealing with regressions, I am getting an 
> importance object
> (rfresult$importance) with two columns, labeled "%IncMSE" 
> (the first column)
> and "IncNodePurity" (the second column). I assume it's the 
> first one that is
> the mean decrease in accuracy due to permutation. Am I correct or am I
> wrong? I am confused because ?randomForest says: "or 
> Regression, the first
> column is the mean decrease in accuracy and the second the 
> mean decrease in
> MSE." - but it is the first column, not the second that has 
> "MSE" in its
> header.
In regression trees, node impurity is measured by MSE, therefore the
second measure that averages cumulative reduction in node impurity due
to splits by a variable over all trees is labelled as "mean decrease in
MSE".
 > 2. According to this thread (
> http://www.mail-archive.com/r-help at stat.math.ethz.ch/msg94873.
> html), The
> overall importance measure is mean(d[i]) / se(d[i]), where se(d[i]) is
> sd(d[i])/sqrt(ntree) (the "standard error").
> So, in order to get at the importance of predictors (and I 
> want to use the
> permutation-based importance) - should I just take the first column of
> rfresult$importance or should I first divide rfresult$importance by
> rfresult$importanceSD - to get something analogous to z-scores and use
> those?
See the "scale" argument in ?importance.  The recommended way of
extracting components of an object in R is to use the extractor
functions when they exist.
 > Thank you very much!
> 
> -- 
> Dimitri Liakhovitski
> Ninah.com
> Dimitri.Liakhovitski at ninah.com
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> Notice:  This e-mail message, together with any attachme...{{dropped:11}}

Possibly Parallel Threads

Search for more maybe matching threads

R help - May 2010 - randomForest: predictor importance (for regressions)

[R] randomForest: predictor importance (for regressions)

[R] randomForest: predictor importance (for regressions)

Possibly Parallel Threads