thr3ads.net - similar to: "Random Forest Reading N/A's, I don't see them"

Displaying 20 results from an estimated 2000 matches similar to: "Random Forest Reading N/A's, I don't see them"

use "caret" to rank predictors by random forest model

2011 Mar 07

use "caret" to rank predictors by random forest model

Hi, I'm using package "caret" to rank predictors using random forest model and draw predictors importance plot. I used below commands: rf.fit<-randomForest(x,y,ntree=500,importance=TRUE) ## "x" is matrix whose columns are predictors, "y" is a binary resonse vector ## Then I got the ranked predictors by ranking

How do I make R randomForest model size smaller?

2012 Dec 03

How do I make R randomForest model size smaller?

I've been training randomForest models on 7 million rows of data (41 features). Here's an example call: myModel <- randomForest(RESPONSE~., data=mydata, ntree=50, maxnodes=30) I thought surely with only 50 trees and 30 terminal nodes that the memory footprint of "myModel" would be small. But it's 65 megs in a dump file. The object seems to be holding all sorts of

sampsize in Random Forests

2008 Mar 09

sampsize in Random Forests

Hi all, I have a dataset where each point is assigned to a class A, B, C, or D. Each point is also assigned to a study site. Each study site is coded with a number ranging between 1-100. This information is stored in the vector studySites. I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites) But I am not sure how to control the number of

No Data in randomForest predict

2012 May 05

No Data in randomForest predict

I would like to ask a general question about the randomForest predict function and how it handles No Data values. I understand that you can omit No Data values while developing the randomForest object, but how does it handle No Data in the prediction phase? I would like the output to be NA if any (not just all) of the input data have an NA value. It is not clear to me if this is the default or

substituting dots in the names of the columns (sub, gsub, regexpr)

2007 Jul 26

substituting dots in the names of the columns (sub, gsub, regexpr)

Dear R users, I have the following two problems, related to the function sub, grep, regexpr and similia. The header of the file(s) I have to import is like this. c("y (m)", "BD (g/cm3)", "PR (Mpa)", "Ks (m/s)", "SP g./g.", "P (m3/m3)", "theta1 (g/g)", "theta2 (g/g)", "AWC (g/g)") To get rid of spaces and

Help!

2010 Sep 20

Help!

Please I need some help using R to analyze my data. What I would like to do is to repeat the same basic process (e.g. linear regression between wood density and distance from pith) for at least 240 data subsets within the main data-frame. Within the main data-frame, these data subsets will be defined by three variables namely, species, individual and core (i.e. 20 species, at least 6

randomForest: help with combine() function

2010 Dec 11

randomForest: help with combine() function

I've built two RF objects (RF1 and RF2) and have tried to combine them, but I get the following error: Error in rf$votes + ifelse(is.na(rflist[[i]]$votes), 0, rflist[[i]]$votes) : non-conformable arrays In addition: Warning message: In rf$oob.times + rflist[[i]]$oob.times : longer object length is not a multiple of shorter object length Both RF models use the same variables, although

randomForest, 'No forest component...' error while calling Predict()

2008 Jun 15

randomForest, 'No forest component...' error while calling Predict()

Dear R-users, While making a prediction using the randomForest function (package randomForest) I'm getting the following error message: "Error in predict.randomForest(model, newdata = CV) : No forest component in the object" Here's my complete code. For reproducing this task, please find my 2 data sets attached ( http://www.nabble.com/file/p17855119/data.rar data.rar ).

boxplot help

2010 Jan 06

boxplot help

Dear Rexperts, I am trying to add a '+' identifying the mean in a boxplot using the following sizelist <- split(size, grp) centers <- boxplot(sizelist, style.bxp = "att", medpch = "o", ylab = "Prostate Volume (cm3)") points(centers, unlist(lapply(sizelist, mean)), pch = "+") But, I get error Error in xy.coords(x, y) :

Random Forest, Giving More Importance to Some Data

2013 Mar 24

Random Forest, Giving More Importance to Some Data

Dear All, I am using randomForest to predict the final selling price of some items. As it often happens, I have a lot of (noisy) historical data, but the question is not so much about data cleaning. The dataset for which I need to carry out some predictions are fairly recent sales or even some sales that will took place in the near future. As a consequence, historical data should be somehow

random forest

2012 Oct 22

random forest

Hi all, Can some one tell me the difference between the following two formulas? 1. epiG.rf <-randomForest(gamma~.,data=data, na.action = na.fail,ntree = 300,xtest = NULL, ytest = NULL,replace = T, proximity =F) 2.epiG.rf <-randomForest(gamma~.,data=data, na.action = na.fail,ntree = 300,xtest = NULL, ytest = NULL,replace = T, proximity =F) [[alternative HTML version deleted]]

Random Forest Error for Factor to Character column

2013 Jan 15

Random Forest Error for Factor to Character column

Hi, Can someone please offer me some guidance? I imported some data. One of the columns called "JOBTITLE" when imported was imported as a factor column with 416 levels. I subset the data in such a way that only 4 levels have data in "JOBTITLE" and tried running randomForest but it complained about "JOBTITLE" having more than 32 categories. I know that is the limit

Random Forest

2007 Apr 23

Random Forest

Hi, I am trying to print out my confusion matrix after having created my random forest. I have put in this command: fit<-randomForest(MMS_ENABLED_HANDSET~.,data=dat,ntree=500,mtry=14, na.action=na.omit,confusion=TRUE) but I can't get it to give me the confusion matrix, anyone know how this works? Thansk! Ruben [[alternative HTML version deleted]]

Re-evaluating the tree in the random forest

2005 Sep 08

Re-evaluating the tree in the random forest

Dear mailinglist members, I was wondering if there was a way to re-evaluate the instances of a tree (in the forest) again after I have manually changed a splitpoint (or split variable) of a decision node. Here's an illustration: library("randomForest") forest.rf <- randomForest(formula = Species ~ ., data = iris, do.trace = TRUE, ntree = 3, mtry = 2, norm.votes = FALSE) # I am

Random Forest prediction questions

2010 Mar 01

Random Forest prediction questions

Hi, I need help with the randomForest prediction. i run the folowing code: > iris.rf <- randomForest(Species ~ ., data=iris, > importance=TRUE,keep.forest=TRUE, proximity=TRUE) > pr<-predict(iris.rf,iris,predict.all=T) > iris.rf$votes[53,] setosa versicolor virginica 0.0000000 0.8074866 0.1925134 > table(pr$individual[53,])/500 versicolor virginica 0.928

Help me! using random Forest package, how to calculate Error Rates in the training set ?

2010 Jan 11

Help me! using random Forest package, how to calculate Error Rates in the training set ?

now I am learining random forest and using random forest package, I can get the OOB error rates, and test set rate, now I want to get the training set error rate, how can I do? pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest=FALSE,do.trace=1e2) using the code can get oob and test set error rate, if I replace x.ts and y.ts with x.tr and y.tr,respectively, is the error rate

Random Forest con poca "n" y muchos predictores

2018 Dec 13

Random Forest con poca "n" y muchos predictores

Hola, Me he iniciado hace poco en Machine Learning, y tengo una duda sobre mis conjuntos de datos: el primero tiene 37 variables explicativas y 116 instancias, y el segundo, 140 variables explicativas y 195 instancias. El primero lo veo bien, ya que hay 3 veces más casos que variables explicativas, pero creo que el segundo caso puede suponer un problema al haber casi el mismo número de

Random Forest

2010 Feb 16

Random Forest

Hi, i'm using randomForest package and i have 2 questions: 1. Can i drop one tree from an RF object? 2. i have a 300 trees forest, but when i use the predict function on new data (with predict.all=TRUE) i get only 270 votes. did i do something wrong? Thanks -- View this message in context: http://n4.nabble.com/Random-Forest-tp1557464p1557464.html Sent from the R help mailing list archive at

Random Forest % Variation vs Psuedo-R^2?

2009 Jun 08

Random Forest % Variation vs Psuedo-R^2?

Hi all (and Andy!), When running a randomForest run in R, I get the last part of an output (with do.trace=T) that looks like this: 1993 | 0.04606 130.43 | 1994 | 0.04605 130.40 | 1995 | 0.04605 130.43 | 1996 | 0.04605 130.43 | 1997 | 0.04606 130.44 | 1998 | 0.04607 130.47 | 1999 | 0.04606 130.46 | 2000 | 0.04605 130.42 | With the first column representing the

Variable Importance - Random Forest

2007 Aug 24

Variable Importance - Random Forest

Hello, I am trying to explore the use of random forests for classification and am certain about the interpretation of the importance measurements. When having the option "importance = T" in the randomForest call, the resulting 'importance' element matrix has four columns with the following headings: 0 - mean raw importance score of variable x for class 0 (where

similar to: Random Forest Reading N/A's, I don't see them