Displaying 20 results from an estimated 23 matches for "roughfix".
2010 Jun 30
2
anyone know why package "RandomForest" na.roughfix is so slow??
Hi all,
I am using the package "random forest" for random forest predictions. I
like the package. However, I have fairly large data sets, and it can often
take *hours* just to go through the "na.roughfix" call, which simply goes
through and cleans up any NA values to either the median (numerical data) or
the most frequent occurrence (factors).
I am going to start doing some comparisons between na.roughfix() and
some apply() functions which, it seems, are able to do the same job more
quickl...
2006 Dec 18
1
Memory problem on a linux cluster using a large data set
...(46) NA's
SNP$total.NAs=NULL # remove added column with sum of NA's
SNP = t(as.matrix(SNP)) # transpose rows and columns
set.seed(1)
snp.na<-SNP
snp.roughfix<-na.roughfix(snp.na)
fSNP<-factor(snp.roughfix[, 1]) # Asigns factor to case control status
snp.narf<- randomForest(snp.roughfix[,-1], fSNP, na.action=na.roughfix, ntree=500, mtry=10, importance=TRUE, keep.forest=FALSE, do.trace...
2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA
in the list. I have used the following functions to impute the missing data
data(pc)
pc.na<-pc
pc.roughfix <- na.roughfix(pc.na)
pc.narf <- randomForest(pc.na, na.action=na.roughfix)
yet it does not replace the NA in the list. Presently I want to replace the
NA with maybe the mean of the rows or columns or some type of correlation.
Any help would be appreciated.
--
View this message in conte...
2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
...> > snp.na<-SNP
>
> R might be clever enough to figure out that this simple
> assignment does not trigger a copy. But it probably means
> that any subsequent modification of snp.na or SNP *will*
> trigger a copy, so avoid the assignment if possible.
>
> > snp.roughfix<-na.roughfix(snp.na)
>
> > fSNP<-factor(snp.roughfix[, 1]) # Asigns
> factor to case control status
> >
> > snp.narf<- randomForest(snp.roughfix[,-1], fSNP,
> > na.action=na.roughfix, ntree=500,...
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package.
Here is a sample...
clunk.roughfix<-na.roughfix(clunk)
>
> clunk.impute<-rfImpute(CONVERT~.,data=clunk)
ntree OOB 1 2
300: 26.80% 3.83% 85.37%
ntree OOB 1 2
300: 18.56% 5.74% 51.22%
Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree,
:
NA not permitted...
2006 Dec 21
1
Memory problem on a linux cluster using a large data set [Broadcast]
...> > snp.na<-SNP
>
> R might be clever enough to figure out that this simple
> assignment does not trigger a copy. But it probably means
> that any subsequent modification of snp.na or SNP *will*
> trigger a copy, so avoid the assignment if possible.
>
> > snp.roughfix<-na.roughfix(snp.na)
>
> > fSNP<-factor(snp.roughfix[, 1]) # Asigns
> factor to case control status
> >
> > snp.narf<- randomForest(snp.roughfix[,-1], fSNP,
> > na.action=na.roughfix, ntree=500,...
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset
which include both numerical and non-numerical variables, and the data
includes some NA. I tried to use na.roughfix but then i get an error
message "na.roughfix only works for numeric or factor". I also tried
rfImpute but this does not work either because I have some NA in my
response variable. Does anyone have som tips to how I can deal with this?
[[alternative HTML version deleted]]
2003 Aug 05
1
na.action in randomForest --- Summary
...n that categorical. My
impression is that because of the randomness and the many trees grown,
filling in missing values with a sensible values does not effect accuracy
much.? (from his report, "Manual On Setting Up, Using, And Understanding
Random Forests V3.1").
I now plan to try the na.roughfix option from Liaw?s package.
Thanks to Uwe Ligges and Brian Ripley for their replies to my posting.
Dave Parkhurst
2010 Dec 11
1
randomForest: help with combine() function
...st[[i]]$votes), 0, rflist[[i]]$votes) :
non-conformable arrays
In addition: Warning message:
In rf$oob.times + rflist[[i]]$oob.times :
longer object length is not a multiple of shorter object length
Both RF models use the same variables, although the NAs in both models
likely differ (using na.roughfix in both models). I assume this is
part of the reason that my arrays are "non-conformable". If so, does
anyone have any suggestions on how to combine in such a situation? How
similar do RFs have to be in order to combine?
Cheers
2009 Jan 10
0
Rserve/RandomForest does not work with a CSV?
Hi all,
We're using Rserve and RandomForest to do classification from within a
Java program. The total is about 4 lines of R code:
library('randomForest')
x
y
future
fit<-randomForest(x,y,no.action=na.roughfix,importance=T,proximity=T)
p<-predict(fit, future)
What is very frustrating is that we have tried this two different ways
(both work in R):
1. Load x, y, and future from a CSV. If I do this, Rserve throws an
error when randomForest() is called.
2. Load x, y, and future by using arrays, and...
2008 Apr 29
1
randomForest and ordered factors
Hello R-user!
I am running R 2.7.0 on a Power Book (Tiger). (I am still R and
statistics beginner)
I try to find the most important variables to divide my dataset as
given in a categorical variable.
code:
Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4,
importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000,
keep.forest=FALSE)
My dataset contains also ordered factors classified as such.
Is randomForest able to deal with it, does it change anything or is
there no difference in using factors or ordered factors?
Many thank...
2011 Jan 03
1
randomForest speed improvements
...;);
data202 <- read.csv ("random.csv", header=TRUE);
x<- data202[1:50000,1:6];
y<- data202[1:50000,8];
y<- y[,drop=TRUE];
x2 <- data202[50001:60000,1:6];
y2 <- data202[50001:60000,8];
y2 <- y2[,drop=TRUE];
RFobject <- randomForest(x,y,na.action=na.roughfix);
p <- predict (RFobject, x2);
In this case, the CSV contains 10 columns, of which 1-6 are numeric in
nature (day of week, week of month, etc...) and column 8 is the target
(sales, a numeric number).
randomForest does fine with the data, our issue is how long it takes. In
this case, about 5...
2011 Feb 10
2
R 2.12.1 Windows 32bit and 64bit - are numerical differences expected?
...23)])
print(model)
On 32bit: Train Error: 0.057
On 64bit: Train Error: 0.055
Changing the seed to 42, for example, brings them into sync.
library(randomForest)
set.seed(41)
model <- randomForest(RainTomorrow ~ ., data=weather[-c(1, 2, 23)],
importance=TRUE, na.action=na.roughfix)
print(model)
On 32bit: OOB estimate of error rate: 12.84%
On 64bit: OOB estimate of error rate: 11.75%
> sessionInfo()
R version 2.12.1 (2010-12-16)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=Eng...
2011 Oct 11
1
Mean or mode imputation fro missing values
Dear R experts,
I have a large database made up of mixed data types (numeric,
character, factor, ordinal factor) with missing values, and I am
looking for a package that would help me impute the missing values
using ?either the mean if numerical or the mode if character/factor.
I maybe could use replace like this:
df$var[is.na(df$var)] <- mean(df$var, na.rm = TRUE)
And go through all the many
2004 Jul 26
5
installing problems repeated.tgz linux
Hi,
i try several possibilities adn looking in the archive,
but didn't getting success to install j.lindsey's usefuel "library
repeated" on my linux (suse9.0 with kernel 2.6.7,R.1.9.1)
P.S. Windows, works fine
Many thanks for help
Christian
chris at linux:/space/downs> R CMD INSTALL - l /usr/lib/R/library repeated
WARNING: invalid package '-'
WARNING:
2004 Jan 12
0
new version of randomForest (4.0-7)
...e() function for extracting the importance
measure.
o The predict() method has an option to return predictions by the component
trees.
o There is a new getTree() function for looking at one of the trees in the
forest.
o For dealing with missing values in the predictor variables, there are
na.roughfix() and rfImpute(), which correspond to the `missquick' and
`missright' options in Breiman's V4/V5 code. Both works for classification
as well as regression.
o There is an experimental bias reduction step in regression (the corr.bias
argument in randomForest) that could be very effecti...
2004 Jan 12
0
new version of randomForest (4.0-7)
...e() function for extracting the importance
measure.
o The predict() method has an option to return predictions by the component
trees.
o There is a new getTree() function for looking at one of the trees in the
forest.
o For dealing with missing values in the predictor variables, there are
na.roughfix() and rfImpute(), which correspond to the `missquick' and
`missright' options in Breiman's V4/V5 code. Both works for classification
as well as regression.
o There is an experimental bias reduction step in regression (the corr.bias
argument in randomForest) that could be very effecti...
2004 Jul 08
0
randomForest 4.3-0 released
...move rows with NAs from the data frame given.
* For regression, if proximity=FALSE, an n by n array of integers is
erroneously allocated but not used (it's only used for proximity
calculation, so not needed otherwise).
* Updated combine() to conform to the new randomForest object.
* na.roughfix() was not working correctly for matrices, which in turns
causes problem in rfImpute().
Changes in 4.1-0:
* In randomForest(), if sampsize is given, the sampling is now done
without replacement, in addition to stratified by class. Therefore
sampsize can not be larger than the class freq...
2004 Jul 08
0
randomForest 4.3-0 released
...move rows with NAs from the data frame given.
* For regression, if proximity=FALSE, an n by n array of integers is
erroneously allocated but not used (it's only used for proximity
calculation, so not needed otherwise).
* Updated combine() to conform to the new randomForest object.
* na.roughfix() was not working correctly for matrices, which in turns
causes problem in rfImpute().
Changes in 4.1-0:
* In randomForest(), if sampsize is given, the sampling is now done
without replacement, in addition to stratified by class. Therefore
sampsize can not be larger than the class freq...
2004 Mar 31
3
help with the usage of "randomForest"
Dear all,
Can anybody give me some hint on the following error msg I got with using
randomForest?
I have two-class classification problem. The data file "sample" is:
----------------------------------------------------------
udomain.edu udomain.hcs hpclass
1 1.0000 1 not
2 NA 2 not
3 NA 0.8 not
4 NA 0.2 hp
5 NA 0.9 hp
------------------------------------------------------------
The