Displaying 20 results from an estimated 4000 matches similar to: "Question on: Random Forest Variable Importance for Regression Problems"
2011 Aug 04
1
randomForest partial dependence plot variable names
Hello,
I am running randomForest models on a number of species. I would like to be
able to automate the printing of dependence plots for the most important
variables in each model, but I am unable to figure out how to enter the
variable names into my code. I had originally thought to extract them from
the $importance matrix after sorting by metric (e.g. %IncMSE), but the
importance matrix is n
2010 May 05
1
randomForest: predictor importance (for regressions)
I have a question about predictor importances in randomForest.
Once I've run randomForest and got my object, I get their importances:
rfresult$importance
I also get the "standard errors" of the permutation-based importance
measure: rfresult$importanceSD
I have 2 questions:
1. Because I am dealing with regressions, I am getting an importance object
(rfresult$importance) with two
2010 Jul 13
1
question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
Hi everyone,
I have another "Random Forest" package question:
- my (presumably incorrect) understanding of the varImpPlot is that it
should plot the "% increase in MSE" and "IncNodePurity" exactly as can be
found from the "importance" section of the model results.
- However, the plot does not, in fact, match the "importance"
2013 May 17
2
Selecting A List of Columns
Dear R Helpers,
I need help with a slightly unusual situation in which I am trying to
select some columns from a data frame. I know how to use the subset
statement with column names as in:
x=as.data.frame(matrix(c(1,2,3,
1,2,3,
1,2,2,
1,2,2,
1,1,1),ncol=3,byrow=T))
all.cols<-colnames(x)
to.keep<-all.cols[1:2]
Kept<-subset(x,select=to.keep)
Kept
2010 May 05
0
Which column in randomForest importances (for regression) is MSE and which IncNodePurity
I've run the function randomForest with importance=T. All my variables
(predictors and the dependent variable) are numeric.
rf<-randomForest(formula, data=mydata, importance=T, etc.)
my results object "rf" contains predictor importances:
rf$importance
I am seeing two columns:
%IncMSE IncNodePurity
V1 -0.01683558 58.10910
V2 0.04000299 71.27579
V3 0.01974636
2009 Jun 24
1
Random Forest Variable Importance Interpretation
Hi
I am trying to explore the use of random forests for regression to
identify the important environmental/microclimate variables involved in
predicting the abundance of a species in different habitats, there are
approx 40 variable and between 200 and 500 data points depending on the
dataset. I have successfully used the randomForest package to conduct
the analysis and looked at the %IncMSE
2011 Sep 20
1
randomForest - NaN in %IncMSE
Hi
I am having a problem using varImpPlot in randomForest. I get the error
message "Error in plot.window(xlim = xlim, ylim = ylim, log = "") : need
finite 'xlim' values"
When print $importance, several variables have NaN under %IncMSE. There
are no NaNs in the original data. Can someone help me figure out what is
happening here?
Thanks!
[[alternative HTML
2012 Aug 27
1
interpret the importance output?
> importance(rfor.pdp11_t25.comb1,type=1)
%IncMSE
v1 -0.28956401263
v2 1.92865561147
v3 -0.63443929130
v4 1.58949137047
v5 0.03190940065
I wasn't entirely confident with interpreting these results based on the
documentation.
Could you please interpret?
[[alternative HTML version deleted]]
2010 Aug 06
1
Error on random forest variable importance estimates
Hello,
I am using the R randomForest package to classify variable stars. I have
a training set of 1755 stars described by (too) many variables. Some of
these variables are highly correlated.
I believe that I understand how randomForest works and how the variable
importance are evaluated (through variable permutations). Here are my
questions.
1) variable importance error? Is there any ways
2010 Apr 29
1
variable importance in Random Forest
HI, Dear Andy,
I run the RandomFOrest in R, and get the following resutls in variable
importance:
What is the meaning of MeanDecreaseAccuracy and MeanDecreaseGini?
I found they are raw values, they are not scaled to 1, right?
Which column if most similar to the variable rel.influence in Boosting?
Thanks so much!
> fit$importance
0 1
2007 Aug 24
2
Variable Importance - Random Forest
Hello,
I am trying to explore the use of random forests for classification and
am certain about the interpretation of the importance measurements.
When having the option "importance = T" in the randomForest call, the
resulting 'importance' element matrix has four columns with the
following headings:
0 - mean raw importance score of variable x for class 0 (where
2004 Jun 04
1
rpart
Hello everyone,
I'm a newbie to R and to CART so I hope my questions don't seem too stupid.
1.)
My first question concerns the rpart() method. Which method does rpart use in
order to get the best split - entropy impurity, Bayes error (min. error) or Gini
index? Is there a way to make it use the entropy impurity?
The second and third question concern the output of the printcp() function.
2013 Mar 24
1
Random Forest, Giving More Importance to Some Data
Dear All,
I am using randomForest to predict the final selling price of some items.
As it often happens, I have a lot of (noisy) historical data, but the
question is not so much about data cleaning.
The dataset for which I need to carry out some predictions are fairly
recent sales or even some sales that will took place in the near future.
As a consequence, historical data should be somehow
2010 Mar 01
1
Random Forest prediction questions
Hi,
I need help with the randomForest prediction. i run the folowing code:
> iris.rf <- randomForest(Species ~ ., data=iris,
> importance=TRUE,keep.forest=TRUE, proximity=TRUE)
> pr<-predict(iris.rf,iris,predict.all=T)
> iris.rf$votes[53,]
setosa versicolor virginica
0.0000000 0.8074866 0.1925134
> table(pr$individual[53,])/500
versicolor virginica
0.928
2009 Mar 27
1
Random Forest Variable Importance
Hello,
I have an object of Random Forest : iris.rf (importance = TRUE).
What is the difference between "iris.rf$importance" and "importance(iris.rf)"?
Thank you in advance,
Best,
Li GUO
[[alternative HTML version deleted]]
2010 Mar 16
1
Regarding variable importance in the randomForest package
For anyone who is knowledgeable about the randomForest package in R, I have
a question:
When I look at the variable importance for data, I see that my response
variable is included along with my predictor variables. That is, I am
getting a MeanDecreaseGini for my response variable, and therefore it seems
as though it is being treated as a predictor variable.
my code (just in case it helps) :
2009 Jun 08
1
Random Forest % Variation vs Psuedo-R^2?
Hi all (and Andy!),
When running a randomForest run in R, I get the last part of an output
(with do.trace=T) that looks like this:
1993 | 0.04606 130.43 |
1994 | 0.04605 130.40 |
1995 | 0.04605 130.43 |
1996 | 0.04605 130.43 |
1997 | 0.04606 130.44 |
1998 | 0.04607 130.47 |
1999 | 0.04606 130.46 |
2000 | 0.04605 130.42 |
With the first column representing the
2008 Oct 02
1
specifying x-axis scale on random forest variable importance plot
i am new to R and using the random forest package. is there a way to specify
the x-axis scale range for the variable importance plot? many thanks.
-alison
--
View this message in context: http://www.nabble.com/specifying-x-axis-scale-on-random-forest-variable-importance-plot-tp19780560p19780560.html
Sent from the R help mailing list archive at Nabble.com.
2008 Jul 05
1
Random Forest %var(y)
The verbose option gives a display like:
> rf.500 <-
+ randomForest(new.x,trn.y,do.trace=20,ntree=100,nodesize=500,
+ importance=T)
| Out-of-bag |
Tree | MSE %Var(y) |
20 | 0.9279 100.84 |
What is the meaning of %var(y)>100%? I expected that to correspond to a
model that was worse than random, but the predictions seem much better than
that on
2009 Apr 28
1
Problem with Random Forest predict
I am trying to run a partialPlot with Random Forest (as I have done many times before).
First I run my forest... Cell is a 6 level factor that is the dependent variable - all other variables are predictors, most of these are factors as well.
predCell<-randomForest(x=tempdata[-match("Cell",names(tempdata))],y=tempdata$Cell,importance=T)
Then I try my partial plot to look at the