thr3ads.net - search: "roughfix"

anyone know why package "RandomForest" na.roughfix is so slow??

2010 Jun 30

2

anyone know why package "RandomForest" na.roughfix is so slow??

Hi all, I am using the package "random forest" for random forest predictions. I like the package. However, I have fairly large data sets, and it can often take *hours* just to go through the "na.roughfix" call, which simply goes through and cleans up any NA values to either the median (numerical data) or the most frequent occurrence (factors). I am going to start doing some comparisons between na.roughfix() and some apply() functions which, it seems, are able to do the same job more quickl...

Memory problem on a linux cluster using a large data set

2006 Dec 18

1

Memory problem on a linux cluster using a large data set

...(46) NA's SNP$total.NAs=NULL # remove added column with sum of NA's SNP = t(as.matrix(SNP)) # transpose rows and columns set.seed(1) snp.na<-SNP snp.roughfix<-na.roughfix(snp.na) fSNP<-factor(snp.roughfix[, 1]) # Asigns factor to case control status snp.narf<- randomForest(snp.roughfix[,-1], fSNP, na.action=na.roughfix, ntree=500, mtry=10, importance=TRUE, keep.forest=FALSE, do.trace...

Imputing data

2011 Dec 02

2

Imputing data

So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or some type of correlation. Any help would be appreciated. -- View this message in conte...

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

2007 Jan 10

1

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

...> > snp.na<-SNP > > R might be clever enough to figure out that this simple > assignment does not trigger a copy. But it probably means > that any subsequent modification of snp.na or SNP *will* > trigger a copy, so avoid the assignment if possible. > > > snp.roughfix<-na.roughfix(snp.na) > > > fSNP<-factor(snp.roughfix[, 1]) # Asigns > factor to case control status > > > > snp.narf<- randomForest(snp.roughfix[,-1], fSNP, > > na.action=na.roughfix, ntree=500,...

rfImpute

2007 Aug 10

1

rfImpute

I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not permitted...

Memory problem on a linux cluster using a large data set [Broadcast]

2006 Dec 21

1

Memory problem on a linux cluster using a large data set [Broadcast]

...> > snp.na<-SNP > > R might be clever enough to figure out that this simple > assignment does not trigger a copy. But it probably means > that any subsequent modification of snp.na or SNP *will* > trigger a copy, so avoid the assignment if possible. > > > snp.roughfix<-na.roughfix(snp.na) > > > fSNP<-factor(snp.roughfix[, 1]) # Asigns > factor to case control status > > > > snp.narf<- randomForest(snp.roughfix[,-1], fSNP, > > na.action=na.roughfix, ntree=500,...

NA in R package randomForest

2012 Mar 26

1

NA in R package randomForest

I have a question regarding NA in randomForest (in R). I have a dataset which include both numerical and non-numerical variables, and the data includes some NA. I tried to use na.roughfix but then i get an error message "na.roughfix only works for numeric or factor". I also tried rfImpute but this does not work either because I have some NA in my response variable. Does anyone have som tips to how I can deal with this? [[alternative HTML version deleted]]

na.action in randomForest --- Summary

2003 Aug 05

1

na.action in randomForest --- Summary

...n that categorical. My impression is that because of the randomness and the many trees grown, filling in missing values with a sensible values does not effect accuracy much.? (from his report, "Manual On Setting Up, Using, And Understanding Random Forests V3.1"). I now plan to try the na.roughfix option from Liaw?s package. Thanks to Uwe Ligges and Brian Ripley for their replies to my posting. Dave Parkhurst

randomForest: help with combine() function

2010 Dec 11

1

randomForest: help with combine() function

...st[[i]]$votes), 0, rflist[[i]]$votes) : non-conformable arrays In addition: Warning message: In rf$oob.times + rflist[[i]]$oob.times : longer object length is not a multiple of shorter object length Both RF models use the same variables, although the NAs in both models likely differ (using na.roughfix in both models). I assume this is part of the reason that my arrays are "non-conformable". If so, does anyone have any suggestions on how to combine in such a situation? How similar do RFs have to be in order to combine? Cheers

Rserve/RandomForest does not work with a CSV?

2009 Jan 10

0

Rserve/RandomForest does not work with a CSV?

Hi all, We're using Rserve and RandomForest to do classification from within a Java program. The total is about 4 lines of R code: library('randomForest') x y future fit<-randomForest(x,y,no.action=na.roughfix,importance=T,proximity=T) p<-predict(fit, future) What is very frustrating is that we have tried this two different ways (both work in R): 1. Load x, y, and future from a CSV. If I do this, Rserve throws an error when randomForest() is called. 2. Load x, y, and future by using arrays, and...

randomForest and ordered factors

2008 Apr 29

1

randomForest and ordered factors

Hello R-user! I am running R 2.7.0 on a Power Book (Tiger). (I am still R and statistics beginner) I try to find the most important variables to divide my dataset as given in a categorical variable. code: Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4, importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000, keep.forest=FALSE) My dataset contains also ordered factors classified as such. Is randomForest able to deal with it, does it change anything or is there no difference in using factors or ordered factors? Many thank...

randomForest speed improvements

2011 Jan 03

1

randomForest speed improvements

...;); data202 <- read.csv ("random.csv", header=TRUE); x<- data202[1:50000,1:6]; y<- data202[1:50000,8]; y<- y[,drop=TRUE]; x2 <- data202[50001:60000,1:6]; y2 <- data202[50001:60000,8]; y2 <- y2[,drop=TRUE]; RFobject <- randomForest(x,y,na.action=na.roughfix); p <- predict (RFobject, x2); In this case, the CSV contains 10 columns, of which 1-6 are numeric in nature (day of week, week of month, etc...) and column 8 is the target (sales, a numeric number). randomForest does fine with the data, our issue is how long it takes. In this case, about 5...

R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?

2011 Feb 10

2

R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?

...23)]) print(model) On 32bit: Train Error: 0.057 On 64bit: Train Error: 0.055 Changing the seed to 42, for example, brings them into sync. library(randomForest) set.seed(41) model <- randomForest(RainTomorrow ~ ., data=weather[-c(1, 2, 23)], importance=TRUE, na.action=na.roughfix) print(model) On 32bit: OOB estimate of error rate: 12.84% On 64bit: OOB estimate of error rate: 11.75% > sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 [3] LC_MONETARY=Eng...

Mean or mode imputation fro missing values

2011 Oct 11

1

Mean or mode imputation fro missing values

Dear R experts, I have a large database made up of mixed data types (numeric, character, factor, ordinal factor) with missing values, and I am looking for a package that would help me impute the missing values using ?either the mean if numerical or the mode if character/factor. I maybe could use replace like this: df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE) And go through all the many

installing problems repeated.tgz linux

2004 Jul 26

5

installing problems repeated.tgz linux

Hi, i try several possibilities adn looking in the archive, but didn't getting success to install j.lindsey's usefuel "library repeated" on my linux (suse9.0 with kernel 2.6.7,R.1.9.1) P.S. Windows, works fine Many thanks for help Christian chris at linux:/space/downs> R CMD INSTALL - l /usr/lib/R/library repeated WARNING: invalid package '-' WARNING:

new version of randomForest (4.0-7)

2004 Jan 12

0

new version of randomForest (4.0-7)

...e() function for extracting the importance measure. o The predict() method has an option to return predictions by the component trees. o There is a new getTree() function for looking at one of the trees in the forest. o For dealing with missing values in the predictor variables, there are na.roughfix() and rfImpute(), which correspond to the `missquick' and `missright' options in Breiman's V4/V5 code. Both works for classification as well as regression. o There is an experimental bias reduction step in regression (the corr.bias argument in randomForest) that could be very effecti...

new version of randomForest (4.0-7)

2004 Jan 12

0

new version of randomForest (4.0-7)

...e() function for extracting the importance measure. o The predict() method has an option to return predictions by the component trees. o There is a new getTree() function for looking at one of the trees in the forest. o For dealing with missing values in the predictor variables, there are na.roughfix() and rfImpute(), which correspond to the `missquick' and `missright' options in Breiman's V4/V5 code. Both works for classification as well as regression. o There is an experimental bias reduction step in regression (the corr.bias argument in randomForest) that could be very effecti...

randomForest 4.3-0 released

2004 Jul 08

0

randomForest 4.3-0 released

...move rows with NAs from the data frame given. * For regression, if proximity=FALSE, an n by n array of integers is erroneously allocated but not used (it's only used for proximity calculation, so not needed otherwise). * Updated combine() to conform to the new randomForest object. * na.roughfix() was not working correctly for matrices, which in turns causes problem in rfImpute(). Changes in 4.1-0: * In randomForest(), if sampsize is given, the sampling is now done without replacement, in addition to stratified by class. Therefore sampsize can not be larger than the class freq...

randomForest 4.3-0 released

2004 Jul 08

0

randomForest 4.3-0 released

...move rows with NAs from the data frame given. * For regression, if proximity=FALSE, an n by n array of integers is erroneously allocated but not used (it's only used for proximity calculation, so not needed otherwise). * Updated combine() to conform to the new randomForest object. * na.roughfix() was not working correctly for matrices, which in turns causes problem in rfImpute(). Changes in 4.1-0: * In randomForest(), if sampsize is given, the sampling is now done without replacement, in addition to stratified by class. Therefore sampsize can not be larger than the class freq...

help with the usage of "randomForest"

2004 Mar 31

3

help with the usage of "randomForest"

Dear all, Can anybody give me some hint on the following error msg I got with using randomForest? I have two-class classification problem. The data file "sample" is: ---------------------------------------------------------- udomain.edu udomain.hcs hpclass 1 1.0000 1 not 2 NA 2 not 3 NA 0.8 not 4 NA 0.2 hp 5 NA 0.9 hp ------------------------------------------------------------ The

search for: roughfix