Mike Williamson
2010-Jul-13 23:46 UTC
[R] question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
Hi everyone, I have another "Random Forest" package question: - my (presumably incorrect) understanding of the varImpPlot is that it should plot the "% increase in MSE" and "IncNodePurity" exactly as can be found from the "importance" section of the model results. - However, the plot does not, in fact, match the "importance" section of the random forest model. E.g., if you use the example given in the ?randomForest, you will see the plot showing the highest few "%IncMSE" values around 17 or 18%. But if you look at the $importance, it is 9.7, 9.4, 7.7, and 7.3. Perhaps more importantly, for the plot, it will show "wt" is highest %MSE, then "disp", then "cyl", then "hp"; whereas the $importance will show "wt", then "disp", then "hp", then "cyl". And the ratios look somewhat different, too. Here is the code for that example: set.seed(4543) data(mtcars) mtcars.rf <- randomForest(mpg ~ ., data=mtcars, ntree=1000, keep.forest=FALSE, importance=TRUE) varImpPlot(mtcars.rf) I am using version 2.11.1 of 'R' and version 4.5-35 of Random Forest. I don't really care or need for the varImpPlot to work just right. But I am not sure which is accurate: the varImpPlot or the $importance section. Which should I trust more, especially when they disagree appreciably? Thanks! Mike "Telescopes and bathyscaphes and sonar probes of Scottish lakes, Tacoma Narrows bridge collapse explained with abstract phase-space maps, Some x-ray slides, a music score, Minard's Napoleanic war: The most exciting frontier is charting what's already here." -- xkcd -- Help protect Wikipedia. Donate now: http://wikimediafoundation.org/wiki/Support_Wikipedia/en [[alternative HTML version deleted]]
Allan Engelhardt
2010-Jul-15 07:54 UTC
[R] question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
Use the source, Luke. varImpPlot calls randomForest:::importance.randomForest (yeah, that is three colons) and reading about the scale= parameter in help("importance", package="randomForest") should enlighten you. For the impatient, try varImpPlot(mtcars.rf, scale=FALSE) Hope this helps a little. Allan On 14/07/10 00:46, Mike Williamson wrote:> Hi everyone, > > I have another "Random Forest" package question: > > - my (presumably incorrect) understanding of the varImpPlot is that it > should plot the "% increase in MSE" and "IncNodePurity" exactly as can be > found from the "importance" section of the model results. > - However, the plot does not, in fact, match the "importance" section > of the random forest model. > > E.g., if you use the example given in the ?randomForest, you will see > the plot showing the highest few "%IncMSE" values around 17 or 18%. But if > you look at the $importance, it is 9.7, 9.4, 7.7, and 7.3. Perhaps more > importantly, for the plot, it will show "wt" is highest %MSE, then "disp", > then "cyl", then "hp"; whereas the $importance will show "wt", then "disp", > then "hp", then "cyl". And the ratios look somewhat different, too. > Here is the code for that example: > > set.seed(4543) > data(mtcars) > mtcars.rf<- randomForest(mpg ~ ., data=mtcars, ntree=1000, > keep.forest=FALSE, > importance=TRUE) > varImpPlot(mtcars.rf) > > I am using version 2.11.1 of 'R' and version 4.5-35 of Random Forest. > > I don't really care or need for the varImpPlot to work just right. But > I am not sure which is accurate: the varImpPlot or the $importance > section. Which should I trust more, especially when they disagree > appreciably? > > Thanks! > Mike > > > > "Telescopes and bathyscaphes and sonar probes of Scottish lakes, > Tacoma Narrows bridge collapse explained with abstract phase-space maps, > Some x-ray slides, a music score, Minard's Napoleanic war: > The most exciting frontier is charting what's already here." > -- xkcd > > -- > Help protect Wikipedia. Donate now: > http://wikimediafoundation.org/wiki/Support_Wikipedia/en > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >