Xiaoqi Cui
2011-Mar-07 20:27 UTC
[R] use "caret" to rank predictors by random forest model
Hi, I'm using package "caret" to rank predictors using random forest model and draw predictors importance plot. I used below commands: rf.fit<-randomForest(x,y,ntree=500,importance=TRUE) ## "x" is matrix whose columns are predictors, "y" is a binary resonse vector ## Then I got the ranked predictors by ranking "rf1$importance[,"MeanDecreaseAccuracy"]" ## Then draw the importance plot varImpPlot(rf.fit) As you can see, all the functions I used are directly from the package "randomForest", instead of from "caret". so I'm wondering if the package "caret" has some functions who can do the above ranking and ploting. In fact, I tried functions "train", "varImp" and "plot" from package "caret", the random forest model that built by "train" can not be input correctly to "varImp", which gave error message like "subscripts out of bounds". Also function "plot" doesn't work neither. So I'm wondering if anybody has encountered the same problem before, and could shed some light on this. I would really appreciate your help. Thanks, Xiaoqi
It would help if you provided the code that you used for the caret functions. The most likely issues is not using importance = TRUE in the call to train() I believe that I've only implemented code for plotting the varImp objects resulting from train() (eg. there is plot.varImp.train but not plot.varImp). Max On Mon, Mar 7, 2011 at 3:27 PM, Xiaoqi Cui <xcui at mtu.edu> wrote:> Hi, > > I'm using package "caret" to rank predictors using random forest model and draw predictors importance plot. I used below commands: > > rf.fit<-randomForest(x,y,ntree=500,importance=TRUE) > ## "x" is matrix whose columns are predictors, "y" is a binary resonse vector > ## Then I got the ranked predictors by ranking "rf1$importance[,"MeanDecreaseAccuracy"]" > ## Then draw the importance plot > varImpPlot(rf.fit) > > As you can see, all the functions I used are directly from the package "randomForest", instead of from "caret". so I'm wondering if the package "caret" has some functions who can do the above ranking and ploting. > > In fact, I tried functions "train", "varImp" and "plot" from package "caret", the random forest model that built by "train" can not be input correctly to "varImp", which gave error message like "subscripts out of bounds". Also function "plot" doesn't work neither. > > So I'm wondering if anybody has encountered the same problem before, and could shed some light on this. I would really appreciate your help. > > Thanks, > Xiaoqi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Max
Xiaoqi Cui
2011-Mar-14 03:49 UTC
[R] use "caret" to rank predictors by random forest model
Thanks for your prompt reply! You're right, I didn't add the parameter "importance=TRUE" when I used function "train" to fit the random forest model. Once I used the above parameter, everything went well. Also the functions "varImp" and "plot" work well too. I noticed "caret" is really good at selecting important predictors. Here I just have another question about using the package "caret" to select the best subset of predictors. As I know, the function "rfe" can be used to select the optimal set of important predictors given a series of sizes of the subsets. I'm wondering if "caret" can automatically give the best size of the selected subset without user providing the candidate sizes. Thanks, Best, Xiaoqi ----- Original Message ----- From: "Max Kuhn" <mxkuhn at gmail.com> To: "Xiaoqi Cui" <xcui at mtu.edu> Cc: r-help at r-project.org Sent: Monday, March 7, 2011 2:33:06 PM GMT -06:00 US/Canada Central Subject: Re: [R] use "caret" to rank predictors by random forest model It would help if you provided the code that you used for the caret functions. The most likely issues is not using importance = TRUE in the call to train() I believe that I've only implemented code for plotting the varImp objects resulting from train() (eg. there is plot.varImp.train but not plot.varImp). Max On Mon, Mar 7, 2011 at 3:27 PM, Xiaoqi Cui <xcui at mtu.edu> wrote:> Hi, > > I'm using package "caret" to rank predictors using random forest model and draw predictors importance plot. I used below commands: > > rf.fit<-randomForest(x,y,ntree=500,importance=TRUE) > ## "x" is matrix whose columns are predictors, "y" is a binary resonse vector > ## Then I got the ranked predictors by ranking "rf1$importance[,"MeanDecreaseAccuracy"]" > ## Then draw the importance plot > varImpPlot(rf.fit) > > As you can see, all the functions I used are directly from the package "randomForest", instead of from "caret". so I'm wondering if the package "caret" has some functions who can do the above ranking and ploting. > > In fact, I tried functions "train", "varImp" and "plot" from package "caret", the random forest model that built by "train" can not be input correctly to "varImp", which gave error message like "subscripts out of bounds". Also function "plot" doesn't work neither. > > So I'm wondering if anybody has encountered the same problem before, and could shed some light on this. I would really appreciate your help. > > Thanks, > Xiaoqi > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Max