Displaying 20 results from an estimated 1000 matches similar to: "randomForest: help with combine() function"
2012 Dec 03
1
How do I make R randomForest model size smaller?
I've been training randomForest models on 7 million rows of data (41
features). Here's an example call:
myModel <- randomForest(RESPONSE~., data=mydata, ntree=50, maxnodes=30)
I thought surely with only 50 trees and 30 terminal nodes that the memory
footprint of "myModel" would be small. But it's 65 megs in a dump file. The
object seems to be holding all sorts of
2005 Mar 23
4
sampling from a mixture distribution
Dear R users,
I would like to sample from a mixture distribution p1*f(x1)+p2*f(x2). I
usually sample variates from both distributions and weight them with their
respective probabilities, but someone told me that was wrong. What is the
correct way?
Vumani
2010 May 22
0
multiple imputation based on a condition
Any suggestions on the following would be grateful.
I'm trying to impute data, where a fictitional dataset is defined as...
set.seed(110)
n <- 500
test <- data.frame(smoke_status = rbinom(n, 2, 0.6), smoke_amount =
rbinom(n, 2, 0.5), rf1 = rnorm(n), rf2 = rnorm(n), outcome = rbinom(n,
1, 0.3))
# smoke_status (0, 1, 2) is c("non-smoker, "ex-smoker",
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all,
I am using the package "random forest" for random forest predictions. I
like the package. However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start
2005 Oct 11
1
a problem in random forest
Hi, there:
I spent some time on this but I think I really cannot figure it out, maybe I
missed something here:
my data looks like this:
> dim(trn3)
[1] 7361 209
> dim(val3)
[1] 7427 209
> mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[,
1:208], ytest=val3[,209], importance=T)
my test data has 7427 observations but after prediction,
> dim(mg.rf2$votes)
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package.
Here is a sample...
clunk.roughfix<-na.roughfix(clunk)
>
> clunk.impute<-rfImpute(CONVERT~.,data=clunk)
ntree OOB 1 2
300: 26.80% 3.83% 85.37%
ntree OOB 1 2
300: 18.56% 5.74% 51.22%
Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree,
:
NA not
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset
which include both numerical and non-numerical variables, and the data
includes some NA. I tried to use na.roughfix but then i get an error
message "na.roughfix only works for numeric or factor". I also tried
rfImpute but this does not work either because I have some NA in my
response variable. Does anyone have som
2012 Jun 06
0
randomForest Species Distribution Modelling
Hi,
I appologise if this is a rudimentary question and long winded but I just
wanted to let ye know where I'm comming from. I'm new to R and I'm trying to
use the 'randomForest' package to classify and predict. The Error message
that is troubling me is:
> pr<-predict(predictors,rf1, ext=ext)
Error in x[...] <- m : NAs are not allowed in subscripted assignments
In
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman?s randomForest; the function?s
help page did not say anything about other options.
I have since discovered that a pdf document called ?The randomForest
Package? and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of
2012 May 05
1
No Data in randomForest predict
I would like to ask a general question about the randomForest predict
function and how it handles No Data values. I understand that you can omit
No Data values while developing the randomForest object, but how does it
handle No Data in the prediction phase? I would like the output to be NA
if any (not just all) of the input data have an NA value. It is not clear
to me if this is the default or
2006 Dec 18
1
Memory problem on a linux cluster using a large data set
Hello,
I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2.
I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might
2008 Apr 29
1
randomForest and ordered factors
Hello R-user!
I am running R 2.7.0 on a Power Book (Tiger). (I am still R and
statistics beginner)
I try to find the most important variables to divide my dataset as
given in a categorical variable.
code:
Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4,
importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000,
keep.forest=FALSE)
My dataset contains also ordered
2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA
in the list. I have used the following functions to impute the missing data
data(pc)
pc.na<-pc
pc.roughfix <- na.roughfix(pc.na)
pc.narf <- randomForest(pc.na, na.action=na.roughfix)
yet it does not replace the NA in the list. Presently I want to replace the
NA with maybe the mean of the rows or columns or
2011 Jan 03
1
randomForest speed improvements
Hi there,
We're trying to use randomForest to do some predictions. The test-harness
for our code is pretty straightforward:
library ('randomForest');
data202 <- read.csv ("random.csv", header=TRUE);
x<- data202[1:50000,1:6];
y<- data202[1:50000,8];
y<- y[,drop=TRUE];
x2 <- data202[50001:60000,1:6];
y2 <- data202[50001:60000,8];
y2 <-
2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi
I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!
2011 Dec 15
2
Random Forest Reading N/A's, I don't see them
After checking the original data in Excel for blanks and running Summary(cm3)
to identify any null values in my data, I'm unable to identify an instances.
Yet when I attempted to use the data in Random Forest, I get the following
error. Is there something that Random Forest is reading as null which is not
actually null? Is there a better way to check for this?
> library(randomForest)
>
2011 Mar 07
2
use "caret" to rank predictors by random forest model
Hi,
I'm using package "caret" to rank predictors using random forest model and draw predictors importance plot. I used below commands:
rf.fit<-randomForest(x,y,ntree=500,importance=TRUE)
## "x" is matrix whose columns are predictors, "y" is a binary resonse vector
## Then I got the ranked predictors by ranking
2011 Feb 10
2
R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?
Should one expect minor numerical differences between 64bit and 32bit R on
Windows? Hunting around the lists I've not been able to find a definitive
answer yet. Seems plausible using different precision arithmetic, but waned
to confirm from those who might know for sure.
BACKGROUND
A colleague was trying to replicate some modelling results (from a soon to
be published book) using rpart, ada,
2008 Mar 09
1
sampsize in Random Forests
Hi all,
I have a dataset where each point is assigned to a class A, B, C, or
D. Each point is also assigned to a study site. Each study site is
coded with a number ranging between 1-100. This information is stored
in the vector studySites.
I want to run randomForests using stratified sampling, so I chose the option
strata = factor(studySites)
But I am not sure how to control the number of
2012 Aug 15
0
[LLVMdev] clang promoting local to global
On Wed, Aug 15, 2012 at 4:10 PM, Eli Friedman <eli.friedman at gmail.com>wrote:
> On Wed, Aug 15, 2012 at 3:17 PM, Ryan Taylor <ryta1203 at gmail.com> wrote:
> > So there are some #define (defined outside the function scope) that use
> > it_tab that are used inside the function, is this why it is promoting it
> to
> > a global?
>
> Macros shouldn't