thr3ads.net - similar to: "anyone know why package "RandomForest" na.roughfix is so slow??"

Displaying 20 results from an estimated 800 matches similar to: "anyone know why package "RandomForest" na.roughfix is so slow??"

manipulating the Date & Time classes

2011 Feb 08

manipulating the Date & Time classes

Hello, This is mostly to developers, but in case I missed something in my literature search, I am sending this to the broader audience. - Are there any plans in the works to make "time" classes a bit more friendly to the rest of the "R" world? I am not suggesting to allow for fancy functions to manipulate times, per se, or to figure out how to properly

manipulating the Date & Time classes

2011 Feb 08

manipulating the Date & Time classes

question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"

2010 Jul 13

question regarding "varImpPlot" results vs. model$importance data on package "RandomForest"

Hi everyone, I have another "Random Forest" package question: - my (presumably incorrect) understanding of the varImpPlot is that it should plot the "% increase in MSE" and "IncNodePurity" exactly as can be found from the "importance" section of the model results. - However, the plot does not, in fact, match the "importance"

How to get 'R' to talk BACK to other languages / scripts??

2010 Dec 03

How to get 'R' to talk BACK to other languages / scripts??

Hey everyone, I know that I can call 'R' from other scripts, and that I can make command calls from 'R' (e.g., using system() ). But how can I get 'R' to RETURN values to the script that called it. E.g., I would like to be able to do something like the following (as a simpler example) from a bash script: #!/bin/bash myTest=echo /usr/local/bin/R --no-restore

how to convert "sloppy data" into a time series?

2010 Dec 17

how to convert "sloppy data" into a time series?

Hi All, First let me state that I did search for a while on r-help, google, and using the "sos" package inside of 'R', without much luck. I want to know how to create a univariate time series from a set of data that will have huge time gaps in it. For instance, here is a snapshot of a piece of data that I would like to analyze: *Row queued_time

ideas, modeling highly discrete time-series data

2010 Dec 20

ideas, modeling highly discrete time-series data

Hello all, First of all, thanks so those of you who helped me a week or so ago managing a time series with varying gaps between the data series in 'R'. (My final preferred solution was to use "its" function & then forecast(Arima( ) ). ) My next question is a general statistical question where I'd like some advice, for those willing / able to proffer any wisdom:

odd behavior of "summary" function

2010 Aug 24

odd behavior of "summary" function

Hello All, Using the standard "summary" function in 'R', I ran across some odd behavior that I cannot understand. Easy to reproduce: Typing: summary(c(6,207936)) Yields:: Min. *1st Qu. Median Mean 3rd Qu. Max.* 6 *51990 104000 104000 156000 207900* None of these values are correct except for the minimum. If I perform "quantile(c(6,

syntax for extending a line in a script??

2011 Jan 12

syntax for extending a line in a script??

Hello, A hopefully simple question. I use 'R' through emacs, but I suspect the following would occur with any manner of text editor: - my editor has a normally quite handy feature where it will automatically indent to the appropriate level when I start a new line. However, this occasionally creates cases where there is no friendly way to break a long line of code into

NA in R package randomForest

2012 Mar 26

NA in R package randomForest

I have a question regarding NA in randomForest (in R). I have a dataset which include both numerical and non-numerical variables, and the data includes some NA. I tried to use na.roughfix but then i get an error message "na.roughfix only works for numeric or factor". I also tried rfImpute but this does not work either because I have some NA in my response variable. Does anyone have som

how can I evaluate a formula passed as a string?

2010 Jun 24

how can I evaluate a formula passed as a string?

Hey everyone, I've been using 'R' long enough that I should have some idea of what the heck either expression() or eval() are really ever useful for. I come across another instance where I WISH they would be useful, but I cannot get them to work. Here is the crux of what I would like to do: presume df looks like this A B C === === === M 45 0 M

rfImpute

2007 Aug 10

rfImpute

I am having trouble with the rfImpute function in the randomForest package. Here is a sample... clunk.roughfix<-na.roughfix(clunk) > > clunk.impute<-rfImpute(CONVERT~.,data=clunk) ntree OOB 1 2 300: 26.80% 3.83% 85.37% ntree OOB 1 2 300: 18.56% 5.74% 51.22% Error in randomForest.default(xf, y, ntree = ntree, ..., do.trace = ntree, : NA not

na.action in randomForest --- Summary

2003 Aug 05

na.action in randomForest --- Summary

A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of

Imputing data

2011 Dec 02

Imputing data

So I have a very big matrix of about 900 by 400 and there are a couple of NA in the list. I have used the following functions to impute the missing data data(pc) pc.na<-pc pc.roughfix <- na.roughfix(pc.na) pc.narf <- randomForest(pc.na, na.action=na.roughfix) yet it does not replace the NA in the list. Presently I want to replace the NA with maybe the mean of the rows or columns or

trouble with RODBC -- chopping off part of column names

2010 Oct 01

trouble with RODBC -- chopping off part of column names

Hello all, I have a strange / interesting problem that might be 'R' settings themselves, or it might be something with the OS. I am using the RODBC library. I have a script that goes out and, before making a query for a big data set, will first query for the column names of the data set. The column names could sometimes be quite long (e.g., "Time Background Estimation

ggplot2 histograms... a subtle error found

2010 Jul 29

ggplot2 histograms... a subtle error found

Hello all, I have a peculiar and particular bug that I stumbled across with ggplot2. I cannot seem to replicate it with anything other than my specific data set. Here is the problem: - when I try to plot a histogram, allowing for ggplot2 to decide the binwidths itself, I get the following error: - stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to

garbage collection & memory leaks in 'R', it seems...

2010 Jul 16

garbage collection & memory leaks in 'R', it seems...

Hello developers, I noticed that if I am running 'R', type "rm(list=objects())" and "gc()", 'R' will still be consuming (a lot) more memory than when I then close 'R' and re-open it. In my ignorance, I'm presuming this is something in 'R' where it doesn't really do a great job of garbage collection... at least not nearly as well as

Memory problem on a linux cluster using a large data set

2006 Dec 18

Memory problem on a linux cluster using a large data set

Hello, I have a large data set 320.000 rows and 1000 columns. All the data has the values 0,1,2. I wrote a script to remove all the rows with more than 46 missing values. This works perfect on a smaller dataset. But the problem arises when I try to run it on the larger data set I get an error “cannot allocate vector size 1240 kb”. I’ve searched through previous posts and found out that it might

randomForest: help with combine() function

2010 Dec 11

randomForest: help with combine() function

I've built two RF objects (RF1 and RF2) and have tried to combine them, but I get the following error: Error in rf$votes + ifelse(is.na(rflist[[i]]$votes), 0, rflist[[i]]$votes) : non-conformable arrays In addition: Warning message: In rf$oob.times + rflist[[i]]$oob.times : longer object length is not a multiple of shorter object length Both RF models use the same variables, although

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

2007 Jan 10

Fw: Memory problem on a linux cluster using a large data set [Broadcast]

Hi I listened to all your advise and ran my data on a computer with a 64 bits procesor but i still get the same error saying "it cannot allocate a vector of that size 1240 kb" . I don't want to cut my data in smaller pieces because we are looking at interaction. So are there any other options for me to try out or should i wait for the development of more advanced computers!

randomForest and ordered factors

2008 Apr 29

randomForest and ordered factors

Hello R-user! I am running R 2.7.0 on a Power Book (Tiger). (I am still R and statistics beginner) I try to find the most important variables to divide my dataset as given in a categorical variable. code: Test.rf4<-randomForest(Sex~.,na.action=na.roughfix, data=Subset4, importance=TRUE, proximity=TRUE, ntree=10000, do.trace=1000, keep.forest=FALSE) My dataset contains also ordered

similar to: anyone know why package "RandomForest" na.roughfix is so slow??