Katharine Miller
2011-Aug-04 20:38 UTC
[R] randomForest partial dependence plot variable names
Hello, I am running randomForest models on a number of species. I would like to be able to automate the printing of dependence plots for the most important variables in each model, but I am unable to figure out how to enter the variable names into my code. I had originally thought to extract them from the $importance matrix after sorting by metric (e.g. %IncMSE), but the importance matrix is n by 2 - containing only the data for each metric (%IncMSE and IncNodePurity). It is clearly linked to the variable names, but I am unsure how to extract those names for use in scripting. Any assistance would be greatly appreciated as I am currently typing the variable names into each partialPlot call for every model I run.....and that is taking a LONG time. Thanks! [[alternative HTML version deleted]]
See if the following is close to what you're looking for. If not, please give more detail on what you want to do. data(airquality) airquality <- na.omit(airquality) set.seed(131) ozone.rf <- randomForest(Ozone ~ ., airquality, importance=TRUE) imp <- importance(ozone.rf) # get the importance measures impvar <- rownames(imp)[order(imp[, 1], decreasing=TRUE)] # get the sorted names op <- par(mfrow=c(2, 3)) for (i in seq_along(impvar)) { partialPlot(ozone.rf, airquality, impvar[i], xlab=impvar[i], main=paste("Partial Dependence on", impvar[i]), ylim=c(30, 70)) } par(op) Andy> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Katharine Miller > Sent: Thursday, August 04, 2011 4:38 PM > To: r-help at r-project.org > Subject: [R] randomForest partial dependence plot variable names > > Hello, > > I am running randomForest models on a number of species. I > would like to be > able to automate the printing of dependence plots for the > most important > variables in each model, but I am unable to figure out how to > enter the > variable names into my code. I had originally thought to > extract them from > the $importance matrix after sorting by metric (e.g. %IncMSE), but the > importance matrix is n by 2 - containing only the data for each metric > (%IncMSE and IncNodePurity). It is clearly linked to the > variable names, > but I am unsure how to extract those names for use in scripting. Any > assistance would be greatly appreciated as I am currently typing the > variable names into each partialPlot call for every model I > run.....and that > is taking a LONG time. > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Notice: This e-mail message, together with any attachme...{{dropped:11}}
Possibly Parallel Threads
- randomForest: predictor importance (for regressions)
- question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
- Which column in randomForest importances (for regression) is MSE and which IncNodePurity
- Selecting A List of Columns
- Question on: Random Forest Variable Importance for Regression Problems