lara harrup (IAH-P)
2009-Jun-24 16:04 UTC
[R] Random Forest Variable Importance Interpretation
Hi I am trying to explore the use of random forests for regression to identify the important environmental/microclimate variables involved in predicting the abundance of a species in different habitats, there are approx 40 variable and between 200 and 500 data points depending on the dataset. I have successfully used the randomForest package to conduct the analysis and looked at the %IncMSE and IncNodeImpurity values given by calling and plotting these out and have looked at the partial dependence plots for the different variables effect of the response but I have been looking though the literature to see how people have previously used this type of analysis and I would like to be able to plot out the overall variable importance in some form of PCA Scree graph but havn't got a clue how to even start this so any suggestions will be most appreciated? Many thanks in advance Lara [[alternative HTML version deleted]]
Hi, Are you looking for variable selection? If this is the case than you can use LASSO, Elastic net, Sparse PLS regression methods which encourages variable selection. PCA does not select variables as you get all your variables in the PCs. You can sparse PCA. Regards Alex On Wed, Jun 24, 2009 at 6:04 PM, lara harrup (IAH-P) < lara.harrup@bbsrc.ac.uk> wrote:> Hi > > > > I am trying to explore the use of random forests for regression to > identify the important environmental/microclimate variables involved in > predicting the abundance of a species in different habitats, there are > approx 40 variable and between 200 and 500 data points depending on the > dataset. I have successfully used the randomForest package to conduct > the analysis and looked at the %IncMSE and IncNodeImpurity values given > by calling and plotting these out and have looked at the partial > dependence plots for the different variables effect of the response but > I have been looking though the literature to see how people have > previously used this type of analysis and I would like to be able to > plot out the overall variable importance in some form of PCA Scree graph > but havn't got a clue how to even start this so any suggestions will be > most appreciated? > > > > Many thanks in advance > > > > Lara > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Apparently Analagous Threads
- randomForest - what is a 'good' pseudo r-squared?
- ANCOVA/glm missing/ignored interaction combinations
- Question on: Random Forest Variable Importance for Regression Problems
- Error with regsubset in leaps package - vcov and all.best option (plus calculating VIFs for subsets)
- representing wind date using windrose