Mike Williamson
2010-Jul-13 23:46 UTC
[R] question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
Hi everyone,
I have another "Random Forest" package question:
- my (presumably incorrect) understanding of the varImpPlot is that it
should plot the "% increase in MSE" and "IncNodePurity"
exactly as can be
found from the "importance" section of the model results.
- However, the plot does not, in fact, match the "importance"
section
of the random forest model.
E.g., if you use the example given in the ?randomForest, you will see
the plot showing the highest few "%IncMSE" values around 17 or 18%.
But if
you look at the $importance, it is 9.7, 9.4, 7.7, and 7.3. Perhaps more
importantly, for the plot, it will show "wt" is highest %MSE, then
"disp",
then "cyl", then "hp"; whereas the $importance will show
"wt", then "disp",
then "hp", then "cyl". And the ratios look somewhat
different, too.
Here is the code for that example:
set.seed(4543)
data(mtcars)
mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000,
keep.forest=FALSE,
importance=TRUE)
varImpPlot(mtcars.rf)
I am using version 2.11.1 of 'R' and version 4.5-35 of Random
Forest.
I don't really care or need for the varImpPlot to work just right. But
I am not sure which is accurate: the varImpPlot or the $importance
section. Which should I trust more, especially when they disagree
appreciably?
Thanks!
Mike
"Telescopes and bathyscaphes and sonar probes of Scottish lakes,
Tacoma Narrows bridge collapse explained with abstract phase-space maps,
Some x-ray slides, a music score, Minard's Napoleanic war:
The most exciting frontier is charting what's already here."
-- xkcd
--
Help protect Wikipedia. Donate now:
http://wikimediafoundation.org/wiki/Support_Wikipedia/en
[[alternative HTML version deleted]]
Allan Engelhardt
2010-Jul-15 07:54 UTC
[R] question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
Use the source, Luke. varImpPlot calls
randomForest:::importance.randomForest (yeah, that is three colons) and
reading about the scale= parameter in help("importance",
package="randomForest") should enlighten you. For the impatient, try
varImpPlot(mtcars.rf, scale=FALSE)
Hope this helps a little.
Allan
On 14/07/10 00:46, Mike Williamson wrote:> Hi everyone,
>
> I have another "Random Forest" package question:
>
> - my (presumably incorrect) understanding of the varImpPlot is that it
> should plot the "% increase in MSE" and
"IncNodePurity" exactly as can be
> found from the "importance" section of the model results.
> - However, the plot does not, in fact, match the
"importance" section
> of the random forest model.
>
> E.g., if you use the example given in the ?randomForest, you will see
> the plot showing the highest few "%IncMSE" values around 17 or
18%. But if
> you look at the $importance, it is 9.7, 9.4, 7.7, and 7.3. Perhaps more
> importantly, for the plot, it will show "wt" is highest %MSE,
then "disp",
> then "cyl", then "hp"; whereas the $importance will
show "wt", then "disp",
> then "hp", then "cyl". And the ratios look somewhat
different, too.
> Here is the code for that example:
>
> set.seed(4543)
> data(mtcars)
> mtcars.rf<- randomForest(mpg ~ ., data=mtcars, ntree=1000,
> keep.forest=FALSE,
> importance=TRUE)
> varImpPlot(mtcars.rf)
>
> I am using version 2.11.1 of 'R' and version 4.5-35 of Random
Forest.
>
> I don't really care or need for the varImpPlot to work just right.
But
> I am not sure which is accurate: the varImpPlot or the $importance
> section. Which should I trust more, especially when they disagree
> appreciably?
>
> Thanks!
> Mike
>
>
>
> "Telescopes and bathyscaphes and sonar probes of Scottish lakes,
> Tacoma Narrows bridge collapse explained with abstract phase-space maps,
> Some x-ray slides, a music score, Minard's Napoleanic war:
> The most exciting frontier is charting what's already here."
> -- xkcd
>
> --
> Help protect Wikipedia. Donate now:
> http://wikimediafoundation.org/wiki/Support_Wikipedia/en
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>