thr3ads.net - similar to: "RandomForest and Missing Values"

Displaying 20 results from an estimated 6000 matches similar to: "RandomForest and Missing Values"

RandomForest, Party and Memory Management

2013 Feb 03

RandomForest, Party and Memory Management

Dear All, For a data mining project, I am relying heavily on the RandomForest and Party packages. Due to the large size of the data set, I have often memory problems (in particular with the Party package; RandomForest seems to use less memory). I really have two questions at this point 1) Please see how I am using the Party and RandomForest packages. Any comment is welcome and useful.

party::cforest - predict?

2013 Feb 14

party::cforest - predict?

What is the function call interface for predict in the package party for cforest? I am looking at the documentation (the vignette) and ?cforest and from the examples I see that one can call the function predict on a cforest classifier. The method predict seems to be a method of the class RandomForest objects of which are returned by cforest. --------------------------- > cf.model =

randomForest and missing data

2007 Jan 04

randomForest and missing data

Does anyone know a reason why, in principle, a call to randomForest cannot accept a data frame with missing predictor values? If each individual tree is built using CART, then it seems like this should be possible. (I understand that one may impute missing values using rfImpute or some other method, but I would like to avoid doing that.) If this functionality were available, then when the trees

Parallelizing GBM

2013 Mar 24

Parallelizing GBM

Dear All, I am far from being a guru about parallel programming. Most of the time, I rely or randomForest for data mining large datasets. I would like to give a try also to the gradient boosted methods in GBM, but I have a need for parallelization. I normally rely on gbm.fit for speed reasons, and I usually call it this way gbm_model <- gbm.fit(trainRF,prices_train, offset = NULL, misc =

Random Forest, Giving More Importance to Some Data

2013 Mar 24

Random Forest, Giving More Importance to Some Data

Dear All, I am using randomForest to predict the final selling price of some items. As it often happens, I have a lot of (noisy) historical data, but the question is not so much about data cleaning. The dataset for which I need to carry out some predictions are fairly recent sales or even some sales that will took place in the near future. As a consequence, historical data should be somehow

rfImpute (for randomForest) crashed

2003 Aug 26

rfImpute (for randomForest) crashed

In trying to execute this line in R (Version 1.7.1 (2003-06-16), under windows XP pro), with the randomForest library (about two weeks old) loaded, the program crashed: bost4rf <- rfImpute(TargetDensity~.,data=bost4rf0) Specifically, an XP dialog box popped up, saying ?R for windows GUI front-end has encountered a problem and needs to close.? That was the dialog saying asking whether I

rfImpute

2007 Aug 10

rfImpute

I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not

help with the usage of "randomForest"

2004 Mar 31

help with the usage of "randomForest"

Dear all, Can anybody give me some hint on the following error msg I got with using randomForest? I have two-class classification problem. The data file "sample" is: ---------------------------------------------------------- udomain.edu udomain.hcs hpclass 1 1.0000 1 not 2 NA 2 not 3 NA 0.8 not 4 NA 0.2 hp 5 NA 0.9 hp ------------------------------------------------------------ The

randomForest Tutorial

2008 Jul 22

randomForest Tutorial

I am new to R and I'd like to use the randomForest package for my thesis (identifying important variables for more detailed analysis with other software). I have found extremely well written and helpful information on the usage of R. Unfortunately it seems to be very difficult to find similarly detailed tutorials for randomForest, and I just can't get it work with the information on

na.action in randomForest --- Summary

2003 Aug 05

na.action in randomForest --- Summary

A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of

importing timestamp data into R

2007 Jan 04

importing timestamp data into R

I have a set of timestamp data that I have in a text file that I would like to import into R for analysis. The timestamps are formated as follows: DT_1,DT_2 [2006/08/10 21:12:14 ],[2006/08/10 21:54:00 ] [2006/08/10 20:42:00 ],[2006/08/10 22:48:00 ] [2006/08/10 20:58:00 ],[2006/08/10 21:39:00 ] [2006/08/04 12:15:24 ],[2006/08/04 12:20:00 ] [2006/08/04 12:02:00 ],[2006/08/04 14:20:00 ] I can get

NA in R package randomForest

2012 Mar 26

NA in R package randomForest

I have a question regarding NA in randomForest (in R). I have a dataset which include both numerical and non-numerical variables, and the data includes some NA. I tried to use na.roughfix but then i get an error message "na.roughfix only works for numeric or factor". I also tried rfImpute but this does not work either because I have some NA in my response variable. Does anyone have som

Imputing data

2011 Dec 02

Imputing data

So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or

anyone know why package "RandomForest" na.roughfix is so slow??

2010 Jun 30

anyone know why package "RandomForest" na.roughfix is so slow??

Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start

Error while using rfImpute

2009 May 08

Error while using rfImpute

Dear Administrator, I am using linux (suse 10.2). While attempting rfImpute, I am getting the following error message: > Members <- rfImpute(Status ~ ., data = Members) Error in .C("classRF", x = x, xdim = as.integer(c(p, n)), y = as.integer(y), : C symbol name "classRF" not in DLL for package "randomForest". I need the help to sort out above error.

Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns

2011 Oct 14

Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns

I would like to build a forest of regression trees to see how well some covariates predict a response variable and to examine the importance of the covariates. I have a small number of covariates (8) and large number of records (27368). The response and all of the covariates are continuous variables. A cursory examination of the covariates does not suggest they are correlated in a simple fashion

caret: Errors with createGrid for rf (randomForest)

2013 Feb 12

caret: Errors with createGrid for rf (randomForest)

When I try to crate a grid of parameters for training with caret I get various errors: ------------------------------------------------------------ > my_grid <- createGrid("rf") Error in if (p <= len) { : argument is of length zero > my_grid <- createGrid("rf", 4) Error in if (p <= len) { : argument is of length zero > my_grid <-

ROC curve in randomForest

2010 Apr 30

ROC curve in randomForest

require(randomForest) rf.pred<-predict(fit, valid, type="prob") > rf.pred[1:20, ] 0 1 16 0.0000 1.0000 23 0.3158 0.6842 43 0.3030 0.6970 52 0.0886 0.9114 55 0.1216 0.8784 75 0.0920 0.9080 82 0.4332 0.5668 120 0.2302 0.7698 128 0.1336 0.8664 147 0.4272 0.5728 148 0.0490 0.9510 153 0.0556 0.9444 161 0.0760 0.9240 162 0.4564 0.5436 172 0.5148 0.4852 176 0.1730

Problems using rfImpute

2008 May 05

Problems using rfImpute

Hello R-user! I am running R 2.7.0 on a Power Book (Tiger). (I am still R and statistics beginner) I tried rfImpute (randomForest) and as far as I understood should it replace NA`s using a proximity matrix: > set.seed(100000) > Subset5Imputed<-rfImpute(Sex~., data=Subset5) ntree OOB 1 2 300: 11.78% 12.36% 11.21% ntree OOB 1 2 300: 12.07% 12.64%

[handling] Missing [values in randomForest]

2005 Sep 12

[handling] Missing [values in randomForest]

Hi Jan-Paul, You definitely want to be careful with na.omit in randomForest -- that wipes out any row with even one NA. If NAs are sprawled throughout your dataset, na.omit might end up killing a lot of rows. Here's my usual MO for missing values: 1) "impute" in Hmisc fills in gaps with the mean, median, most common value, etc. 2) rfImpute: fits a forest on the rows available and

similar to: RandomForest and Missing Values