Displaying 20 results from an estimated 800 matches similar to: "anyone know why package "RandomForest" na.roughfix is so slow??"
2011 Feb 08
4
manipulating the Date & Time classes
Hello,
This is mostly to developers, but in case I missed something in my
literature search, I am sending this to the broader audience.
- Are there any plans in the works to make "time" classes a bit more
friendly to the rest of the "R" world? I am not suggesting to allow for
fancy functions to manipulate times, per se, or to figure out how to
properly
2011 Feb 08
4
manipulating the Date & Time classes
Hello,
This is mostly to developers, but in case I missed something in my
literature search, I am sending this to the broader audience.
- Are there any plans in the works to make "time" classes a bit more
friendly to the rest of the "R" world? I am not suggesting to allow for
fancy functions to manipulate times, per se, or to figure out how to
properly
2010 Jul 13
1
question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"
Hi everyone,
I have another "Random Forest" package question:
- my (presumably incorrect) understanding of the varImpPlot is that it
should plot the "% increase in MSE" and "IncNodePurity" exactly as can be
found from the "importance" section of the model results.
- However, the plot does not, in fact, match the "importance"
2010 Dec 03
2
How to get 'R' to talk BACK to other languages / scripts??
Hey everyone,
I know that I can call 'R' from other scripts, and that I can make
command calls from 'R' (e.g., using system() ). But how can I get 'R' to
RETURN values to the script that called it. E.g., I would like to be able
to do something like the following (as a simpler example) from a bash
script:
#!/bin/bash
myTest=echo /usr/local/bin/R --no-restore
2010 Dec 17
2
how to convert "sloppy data" into a time series?
Hi All,
First let me state that I did search for a while on r-help, google, and
using the "sos" package inside of 'R', without much luck. I want to know
how to create a univariate time series from a set of data that will have
huge time gaps in it. For instance, here is a snapshot of a piece of data
that I would like to analyze:
*Row queued_time
2010 Dec 20
1
ideas, modeling highly discrete time-series data
Hello all,
First of all, thanks so those of you who helped me a week or so ago
managing a time series with varying gaps between the data series in 'R'.
(My final preferred solution was to use "its" function & then
forecast(Arima( ) ). )
My next question is a general statistical question where I'd like some
advice, for those willing / able to proffer any wisdom:
2010 Aug 24
3
odd behavior of "summary" function
Hello All,
Using the standard "summary" function in 'R', I ran across some odd
behavior that I cannot understand. Easy to reproduce:
Typing:
summary(c(6,207936))
Yields::
Min. *1st Qu. Median Mean 3rd Qu. Max.*
6 *51990 104000 104000 156000 207900*
None of these values are correct except for the minimum. If I perform
"quantile(c(6,
2011 Jan 12
2
syntax for extending a line in a script??
Hello,
A hopefully simple question. I use 'R' through emacs, but I suspect the
following would occur with any manner of text editor:
- my editor has a normally quite handy feature where it will
automatically indent to the appropriate level when I start a new line.
However, this occasionally creates cases where there is no friendly way to
break a long line of code into
2012 Mar 26
1
NA in R package randomForest
I have a question regarding NA in randomForest (in R). I have a dataset
which include both numerical and non-numerical variables, and the data
includes some NA. I tried to use na.roughfix but then i get an error
message "na.roughfix only works for numeric or factor". I also tried
rfImpute but this does not work either because I have some NA in my
response variable. Does anyone have som
2010 Jun 24
1
how can I evaluate a formula passed as a string?
Hey everyone,
I've been using 'R' long enough that I should have some idea of what the
heck either expression() or eval() are really ever useful for. I come
across another instance where I WISH they would be useful, but I cannot get
them to work.
Here is the crux of what I would like to do:
presume df looks like this
A B C
=== === ===
M 45 0
M
2007 Aug 10
1
rfImpute
I am having trouble with the rfImpute function in the randomForest package.
Here is a sample...
clunk.roughfix<-na.roughfix(clunk)
>
> clunk.impute<-rfImpute(CONVERT~.,data=clunk)
ntree OOB 1 2
300: 26.80% 3.83% 85.37%
ntree OOB 1 2
300: 18.56% 5.74% 51.22%
Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree,
:
NA not
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman?s randomForest; the function?s
help page did not say anything about other options.
I have since discovered that a pdf document called ?The randomForest
Package? and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of
2011 Dec 02
2
Imputing data
So I have a very big matrix of about 900 by 400 and there are a couple of NA
in the list. I have used the following functions to impute the missing data
data(pc)
pc.na<-pc
pc.roughfix <- na.roughfix(pc.na)
pc.narf <- randomForest(pc.na, na.action=na.roughfix)
yet it does not replace the NA in the list. Presently I want to replace the
NA with maybe the mean of the rows or columns or
2010 Oct 01
2
trouble with RODBC -- chopping off part of column names
Hello all,
I have a strange / interesting problem that might be 'R' settings
themselves, or it might be something with the OS.
I am using the RODBC library. I have a script that goes out and, before
making a query for a big data set, will first query for the column names of
the data set. The column names could sometimes be quite long (e.g., "Time
Background Estimation
2010 Jul 29
2
ggplot2 histograms... a subtle error found
Hello all,
I have a peculiar and particular bug that I stumbled across with
ggplot2. I cannot seem to replicate it with anything other than my specific
data set.
Here is the problem:
- when I try to plot a histogram, allowing for ggplot2 to decide the
binwidths itself, I get the following error:
- stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to
2010 Jul 16
1
garbage collection & memory leaks in 'R', it seems...
Hello developers,
I noticed that if I am running 'R', type "rm(list=objects())" and
"gc()", 'R' will still be consuming (a lot) more memory than when I then
close 'R' and re-open it. In my ignorance, I'm presuming this is something
in 'R' where it doesn't really do a great job of garbage collection... at
least not nearly as well as
2006 Dec 18
1
Memory problem on a linux cluster using a large data set
Hello,
I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2.
I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might
2010 Dec 11
1
randomForest: help with combine() function
I've built two RF objects (RF1 and RF2) and have tried to combine
them, but I get the following error:
Error in rf$votes + ifelse(is.na(rflist[[i]]$votes), 0, rflist[[i]]$votes) :
non-conformable arrays
In addition: Warning message:
In rf$oob.times + rflist[[i]]$oob.times :
longer object length is not a multiple of shorter object length
Both RF models use the same variables, although
2007 Jan 10
1
Fw: Memory problem on a linux cluster using a large data set [Broadcast]
Hi
I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!
2008 Apr 29
1
randomForest and ordered factors
Hello R-user!
I am running R 2.7.0 on a Power Book (Tiger). (I am still R and
statistics beginner)
I try to find the most important variables to divide my dataset as
given in a categorical variable.
code:
Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4,
importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000,
keep.forest=FALSE)
My dataset contains also ordered