thr3ads.net - similar to: "randomForest: help with combine() function"

Displaying 20 results from an estimated 1000 matches similar to: "randomForest: help with combine() function"

How do I make R randomForest model size smaller?

2012 Dec 03

How do I make R randomForest model size smaller?

I've been training randomForest models on 7 million rows of data (41 features). Here's an example call: myModel <- randomForest(RESPONSE~., data=mydata, ntree=50, maxnodes=30) I thought surely with only 50 trees and 30 terminal nodes that the memory footprint of "myModel" would be small. But it's 65 megs in a dump file. The object seems to be holding all sorts of

sampling from a mixture distribution

2005 Mar 23

sampling from a mixture distribution

Dear R users, I would like to sample from a mixture distribution p1*f(x1)+p2*f(x2). I usually sample variates from both distributions and weight them with their respective probabilities, but someone told me that was wrong. What is the correct way? Vumani

multiple imputation based on a condition

2010 May 22

multiple imputation based on a condition

Any suggestions on the following would be grateful. I'm trying to impute data, where a fictitional dataset is defined as... set.seed(110) n <- 500 test <- data.frame(smoke_status = rbinom(n, 2, 0.6), smoke_amount = rbinom(n, 2, 0.5), rf1 = rnorm(n), rf2 = rnorm(n), outcome = rbinom(n, 1, 0.3)) # smoke_status (0, 1, 2) is c("non-smoker, "ex-smoker",

anyone know why package "RandomForest" na.roughfix is so slow??

2010 Jun 30

anyone know why package "RandomForest" na.roughfix is so slow??

Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start

a problem in random forest

2005 Oct 11

a problem in random forest

Hi, there: I spent some time on this but I think I really cannot figure it out, maybe I missed something here: my data looks like this: > dim(trn3) [1] 7361 209 > dim(val3) [1] 7427 209 > mg.rf2<-randomForest(x=trn3[,1:208], y=trn3[,209], data=trn3, xtest=val3[, 1:208], ytest=val3[,209], importance=T) my test data has 7427 observations but after prediction, > dim(mg.rf2$votes)

rfImpute

2007 Aug 10

rfImpute

I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not

NA in R package randomForest

2012 Mar 26

NA in R package randomForest

I have a question regarding NA in randomForest (in R). I have a dataset which include both numerical and non-numerical variables, and the data includes some NA. I tried to use na.roughfix but then i get an error message "na.roughfix only works for numeric or factor". I also tried rfImpute but this does not work either because I have some NA in my response variable. Does anyone have som

randomForest Species Distribution Modelling

2012 Jun 06

randomForest Species Distribution Modelling

Hi, I appologise if this is a rudimentary question and long winded but I just wanted to let ye know where I'm comming from. I'm new to R and I'm trying to use the 'randomForest' package to classify and predict. The Error message that is troubling me is: > pr<-predict(predictors,rf1, ext=ext) Error in x[...] <- m : NAs are not allowed in subscripted assignments In

na.action in randomForest --- Summary

2003 Aug 05

na.action in randomForest --- Summary

A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of

No Data in randomForest predict

2012 May 05

No Data in randomForest predict

I would like to ask a general question about the randomForest predict function and how it handles No Data values. I understand that you can omit No Data values while developing the randomForest object, but how does it handle No Data in the prediction phase? I would like the output to be NA if any (not just all) of the input data have an NA value. It is not clear to me if this is the default or

Memory problem on a linux cluster using a large data set

2006 Dec 18

Memory problem on a linux cluster using a large data set

Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might

randomForest and ordered factors

2008 Apr 29

randomForest and ordered factors

Hello R-user! I am running R 2.7.0 on a Power Book (Tiger). (I am still R and statistics beginner) I try to find the most important variables to divide my dataset as given in a categorical variable. code: Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4, importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000, keep.forest=FALSE) My dataset contains also ordered

Imputing data

2011 Dec 02

Imputing data

So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or

randomForest speed improvements

2011 Jan 03

randomForest speed improvements

Hi there, We're trying to use randomForest to do some predictions. The test-harness for our code is pretty straightforward: library ('randomForest'); data202 <- read.csv ("random.csv", header=TRUE); x<- data202[1:50000,1:6]; y<- data202[1:50000,8]; y<- y[,drop=TRUE]; x2 <- data202[50001:60000,1:6]; y2 <- data202[50001:60000,8]; y2 <-

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

2007 Jan 10

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

Hi I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!

Random Forest Reading N/A's, I don't see them

2011 Dec 15

Random Forest Reading N/A's, I don't see them

After checking the original data in Excel for blanks and running Summary(cm3) to identify any null values in my data, I'm unable to identify an instances. Yet when I attempted to use the data in Random Forest, I get the following error. Is there something that Random Forest is reading as null which is not actually null? Is there a better way to check for this? > library(randomForest) >

use "caret" to rank predictors by random forest model

2011 Mar 07

use "caret" to rank predictors by random forest model

Hi, I'm using package "caret" to rank predictors using random forest model and draw predictors importance plot. I used below commands: rf.fit<-randomForest(x,y,ntree=500,importance=TRUE) ## "x" is matrix whose columns are predictors, "y" is a binary resonse vector ## Then I got the ranked predictors by ranking

R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?

2011 Feb 10

R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?

Should one expect minor numerical differences between 64bit and 32bit R on Windows? Hunting around the lists I've not been able to find a definitive answer yet. Seems plausible using different precision arithmetic, but waned to confirm from those who might know for sure. BACKGROUND A colleague was trying to replicate some modelling results (from a soon to be published book) using rpart, ada,

sampsize in Random Forests

2008 Mar 09

sampsize in Random Forests

Hi all, I have a dataset where each point is assigned to a class A, B, C, or D. Each point is also assigned to a study site. Each study site is coded with a number ranging between 1-100. This information is stored in the vector studySites. I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites) But I am not sure how to control the number of

[LLVMdev] clang promoting local to global

2012 Aug 15

[LLVMdev] clang promoting local to global

On Wed, Aug 15, 2012 at 4:10 PM, Eli Friedman <eli.friedman at gmail.com>wrote: > On Wed, Aug 15, 2012 at 3:17 PM, Ryan Taylor <ryta1203 at gmail.com> wrote: > > So there are some #define (defined outside the function scope) that use > > it_tab that are used inside the function, is this why it is promoting it > to > > a global? > > Macros shouldn't

similar to: randomForest: help with combine() function