Katharine Miller
2011-Aug-04 20:38 UTC
[R] randomForest partial dependence plot variable names
Hello, I am running randomForest models on a number of species. I would like to be able to automate the printing of dependence plots for the most important variables in each model, but I am unable to figure out how to enter the variable names into my code. I had originally thought to extract them from the $importance matrix after sorting by metric (e.g. %IncMSE), but the importance matrix is n by 2 - containing only the data for each metric (%IncMSE and IncNodePurity). It is clearly linked to the variable names, but I am unsure how to extract those names for use in scripting. Any assistance would be greatly appreciated as I am currently typing the variable names into each partialPlot call for every model I run.....and that is taking a LONG time. Thanks! [[alternative HTML version deleted]]
See if the following is close to what you're looking for. If not, please
give more detail on what you want to do.
data(airquality)
airquality <- na.omit(airquality)
set.seed(131)
ozone.rf <- randomForest(Ozone ~ ., airquality, importance=TRUE)
imp <- importance(ozone.rf) # get the importance measures
impvar <- rownames(imp)[order(imp[, 1], decreasing=TRUE)] # get the sorted
names
op <- par(mfrow=c(2, 3))
for (i in seq_along(impvar)) {
partialPlot(ozone.rf, airquality, impvar[i], xlab=impvar[i],
main=paste("Partial Dependence on", impvar[i]),
ylim=c(30, 70))
}
par(op)
Andy
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Katharine Miller
> Sent: Thursday, August 04, 2011 4:38 PM
> To: r-help at r-project.org
> Subject: [R] randomForest partial dependence plot variable names
>
> Hello,
>
> I am running randomForest models on a number of species. I
> would like to be
> able to automate the printing of dependence plots for the
> most important
> variables in each model, but I am unable to figure out how to
> enter the
> variable names into my code. I had originally thought to
> extract them from
> the $importance matrix after sorting by metric (e.g. %IncMSE), but the
> importance matrix is n by 2 - containing only the data for each metric
> (%IncMSE and IncNodePurity). It is clearly linked to the
> variable names,
> but I am unsure how to extract those names for use in scripting. Any
> assistance would be greatly appreciated as I am currently typing the
> variable names into each partialPlot call for every model I
> run.....and that
> is taking a LONG time.
>
> Thanks!
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}
Possibly Parallel Threads
- randomForest: predictor importance (for regressions)
- question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
- Which column in randomForest importances (for regression) is MSE and which IncNodePurity
- Selecting A List of Columns
- Question on: Random Forest Variable Importance for Regression Problems