Dimitri Liakhovitski
2010-May-05 17:04 UTC
[R] Which column in randomForest importances (for regression) is MSE and which IncNodePurity
I've run the function randomForest with importance=T. All my variables
(predictors and the dependent variable) are numeric.
rf<-randomForest(formula, data=mydata, importance=T, etc.)
my results object "rf" contains predictor importances:
rf$importance
I am seeing two columns:
%IncMSE IncNodePurity
V1 -0.01683558 58.10910
V2 0.04000299 71.27579
V3 0.01974636 67.22586
V4 0.25020393 113.69823
V5 0.03146358 67.11151
V6 0.01717313 66.57246
V7 -0.00500985 62.37103
V8 -0.02862065 66.15369
V9 -0.02431507 54.50013
They seem to be clearly labeled %IncMSE and IncNodePurity
However, when I look in ?randomForest, I am reading about importance as a
component of my rf object:
A matrix with nclass + 2 (for classification) or two (for regression)
columns. For classification, the first nclass columns are the class-specific
measures computed as mean descrease in accuracy. The nclass + 1st column is
the mean descrease in accuracy over all classes. The last column is the mean
decrease in Gini index. *For Regression, the first column is the mean
decrease in accuracy and the second the mean decrease in MSE. If
importance=FALSE, the last measure is still returned as a vector.*
Maybe I am confused for no reason - but which column is which?
Is %IncMSE = mean decrease in accuracy?
Thanks a lot for clarifying!
--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski@ninah.com
[[alternative HTML version deleted]]
Possibly Parallel Threads
- randomForest: predictor importance (for regressions)
- Question on: Random Forest Variable Importance for Regression Problems
- Selecting A List of Columns
- randomForest partial dependence plot variable names
- question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
