Displaying 20 results from an estimated 6000 matches similar to: "RandomForest and Missing Values"
2013 Feb 03
3
RandomForest, Party and Memory Management
Dear All,
For a data mining project, I am relying heavily on the RandomForest and
Party packages.
Due to the large size of the data set, I have often memory problems (in
particular with the Party package; RandomForest seems to use less memory).
I really have two questions at this point
1) Please see how I am using the Party and RandomForest packages. Any
comment is welcome and useful.
2013 Feb 14
1
party::cforest - predict?
What is the function call interface for predict in the package party for
cforest? I am looking at the documentation (the vignette) and ?cforest and
from the examples I see that one can call the function predict on a cforest
classifier. The method predict seems to be a method of the class
RandomForest objects of which are returned by cforest.
---------------------------
> cf.model =
2007 Jan 04
3
randomForest and missing data
Does anyone know a reason why, in principle, a call to randomForest
cannot accept a data frame with missing predictor values? If each
individual tree is built using CART, then it seems like this
should be possible. (I understand that one may impute missing values
using rfImpute or some other method, but I would like to avoid doing
that.)
If this functionality were available, then when the trees
2013 Mar 24
3
Parallelizing GBM
Dear All,
I am far from being a guru about parallel programming.
Most of the time, I rely or randomForest for data mining large datasets.
I would like to give a try also to the gradient boosted methods in GBM,
but I have a need for parallelization.
I normally rely on gbm.fit for speed reasons, and I usually call it this
way
gbm_model <- gbm.fit(trainRF,prices_train,
offset = NULL,
misc =
2013 Mar 24
1
Random Forest, Giving More Importance to Some Data
Dear All,
I am using randomForest to predict the final selling price of some items.
As it often happens, I have a lot of (noisy) historical data, but the
question is not so much about data cleaning.
The dataset for which I need to carry out some predictions are fairly
recent sales or even some sales that will took place in the near future.
As a consequence, historical data should be somehow
2003 Aug 26
1
rfImpute (for randomForest) crashed
In trying to execute this line in R (Version 1.7.1 (2003-06-16), under
windows XP pro), with the randomForest library (about two weeks old) loaded,
the program crashed:
bost4rf <- rfImpute(TargetDensity~.,data=bost4rf0)
Specifically, an XP dialog box popped up, saying ?R for windows GUI
front-end has encountered a problem and needs to close.? That was the
dialog saying asking whether I
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package.
Here is a sample...
clunk.roughfix<-na.roughfix(clunk)
>
> clunk.impute<-rfImpute(CONVERT~.,data=clunk)
ntree OOB 1 2
300: 26.80% 3.83% 85.37%
ntree OOB 1 2
300: 18.56% 5.74% 51.22%
Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree,
:
NA not
2004 Mar 31
3
help with the usage of "randomForest"
Dear all,
Can anybody give me some hint on the following error msg I got with using
randomForest?
I have two-class classification problem. The data file "sample" is:
----------------------------------------------------------
udomain.edu udomain.hcs hpclass
1 1.0000 1 not
2 NA 2 not
3 NA 0.8 not
4 NA 0.2 hp
5 NA 0.9 hp
------------------------------------------------------------
The
2008 Jul 22
2
randomForest Tutorial
I am new to R and I'd like to use the randomForest package for my thesis
(identifying important variables for more detailed analysis with other
software). I have found extremely well written and helpful information on
the usage of R.
Unfortunately it seems to be very difficult to find similarly detailed
tutorials for randomForest, and I just can't get it work with the
information on
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman?s randomForest; the function?s
help page did not say anything about other options.
I have since discovered that a pdf document called ?The randomForest
Package? and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of
2007 Jan 04
2
importing timestamp data into R
I have a set of timestamp data that I have in a text file that I would like
to import into R for analysis.
The timestamps are formated as follows:
DT_1,DT_2
[2006/08/10 21:12:14 ],[2006/08/10 21:54:00 ]
[2006/08/10 20:42:00 ],[2006/08/10 22:48:00 ]
[2006/08/10 20:58:00 ],[2006/08/10 21:39:00 ]
[2006/08/04 12:15:24 ],[2006/08/04 12:20:00 ]
[2006/08/04 12:02:00 ],[2006/08/04 14:20:00 ]
I can get
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset
which include both numerical and non-numerical variables, and the data
includes some NA. I tried to use na.roughfix but then i get an error
message "na.roughfix only works for numeric or factor". I also tried
rfImpute but this does not work either because I have some NA in my
response variable. Does anyone have som
2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA
in the list. I have used the following functions to impute the missing data
data(pc)
pc.na<-pc
pc.roughfix <- na.roughfix(pc.na)
pc.narf <- randomForest(pc.na, na.action=na.roughfix)
yet it does not replace the NA in the list. Presently I want to replace the
NA with maybe the mean of the rows or columns or
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all,
I am using the package "random forest" for random forest predictions. I
like the package. However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start
2009 May 08
1
Error while using rfImpute
Dear Administrator,
I am using linux (suse 10.2). While attempting rfImpute, I am getting the
following error message:
> Members <- rfImpute(Status ~ ., data = Members)
Error in .C("classRF", x = x, xdim = as.integer(c(p, n)), y =
as.integer(y), :
C symbol name "classRF" not in DLL for package "randomForest".
I need the help to sort out above error.
2011 Oct 14
1
Party package: varimp(..., conditional=TRUE) error: term 1 would require 9e+12 columns
I would like to build a forest of regression trees to see how well some
covariates predict a response variable and to examine the importance of the
covariates. I have a small number of covariates (8) and large number of
records (27368). The response and all of the covariates are continuous
variables.
A cursory examination of the covariates does not suggest they are correlated
in a simple fashion
2013 Feb 12
1
caret: Errors with createGrid for rf (randomForest)
When I try to crate a grid of parameters for training with caret I get
various errors:
------------------------------------------------------------
> my_grid <- createGrid("rf")
Error in if (p <= len) { : argument is of length zero
> my_grid <- createGrid("rf", 4)
Error in if (p <= len) { : argument is of length zero
> my_grid <-
2010 Apr 30
0
ROC curve in randomForest
require(randomForest)
rf.pred<-predict(fit, valid, type="prob")
> rf.pred[1:20, ]
0 1
16 0.0000 1.0000
23 0.3158 0.6842
43 0.3030 0.6970
52 0.0886 0.9114
55 0.1216 0.8784
75 0.0920 0.9080
82 0.4332 0.5668
120 0.2302 0.7698
128 0.1336 0.8664
147 0.4272 0.5728
148 0.0490 0.9510
153 0.0556 0.9444
161 0.0760 0.9240
162 0.4564 0.5436
172 0.5148 0.4852
176 0.1730
2008 May 05
1
Problems using rfImpute
Hello R-user!
I am running R 2.7.0 on a Power Book (Tiger). (I am still R and
statistics beginner)
I tried rfImpute (randomForest) and as far as I understood should it
replace NA`s using a proximity matrix:
> set.seed(100000)
> Subset5Imputed<-rfImpute(Sex~., data=Subset5)
ntree OOB 1 2
300: 11.78% 12.36% 11.21%
ntree OOB 1 2
300: 12.07% 12.64%
2005 Sep 12
0
[handling] Missing [values in randomForest]
Hi Jan-Paul,
You definitely want to be careful with na.omit in randomForest -- that
wipes out any row with even one NA. If NAs are sprawled throughout your
dataset, na.omit might end up killing a lot of rows. Here's my usual MO
for missing values:
1) "impute" in Hmisc fills in gaps with the mean, median, most common
value, etc.
2) rfImpute: fits a forest on the rows available and