similar to: Examples of "classwt", "strata", and "sampsize" in randomForest?

Displaying 20 results from an estimated 1000 matches similar to: "Examples of "classwt", "strata", and "sampsize" in randomForest?"

2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?
Sorry for the repost, but I've really been looking, and can't find any syntax direction on this issue... Just browsing the documentation, and searching the list came up short... I have some unbalanced data and was wondering if, in a "0" v "1" classification forest, some combo of these options might yield better predictions when the proportion of one class is low (less
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?
"classwt" in the current version of the randomForest package doesn't work too well. (It's what was in version 3.x of the original Fortran code by Breiman and Cutler, not the one in the new Fortran code.) I'd advise against using it. "sampsize" and "strata" can be use in conjunction. If "strata" is not specified, the class labels will be used.
2007 Jan 28
2
help with RandomForest classwt option
Hello there, I am working on an extremely unbalanced two class classification problems. I wanna use "classwt" with "down sampling" together. By checking the rfNews() in R, it looks that classwt is not working yet. Then I looked at the software from Salford. I did not find the down sampling option. I am wondering if you have any experience to deal with this problem. Do you
2008 May 21
1
How to use classwt parameter option in RandomForest
Hi, I am trying to model a dataset with the response variable Y, which has 6 levels { Great, Greater, Greatest, Weak, Weaker, Weakest}, and predictor variables X, with continuous and factor variables using random forests in R. The variable Y acts like an ordinal variable, but I recoded it as factor variable. I ran a simulation and got OOB estimate of error rate 60%. I validated against some
2011 Sep 13
1
class weights with Random Forest
Hi All, I am looking for a reference that explains how the randomForest function in the randomForest package uses the classwt parameter. Here: http://tolstoy.newcastle.edu.au/R/e4/help/08/05/12088.html Andy Liaw suggests not using classwt. And according to: http://r.789695.n4.nabble.com/R-help-with-RandomForest-classwt-option-td817149.html it has "not been implemented" as of 2007.
2012 Mar 03
0
Strategies to deal with unbalanced classification data in randomForest
Hello all, I have become somewhat confused with options available for dealing with a highly unbalanced data set (10000 in one class, 50 in the other). As a summary I am unsure: a) if I am perform the two class weighting methods properly, b) if the data are too unbalanced and that this type of analysis is appropriate and c) if there is any interaction between the weighting for class imbalances
2005 Nov 07
1
R seems to "stall" after several hours on a long series o f analyses... where to start?
You can test if the problem is accumulation in memory registers, which is certainly what this sounds like. Just do a loop over a reasonably small number of iterations and store or print the time between each iteration. If memory accumulation it will run optimally for the first few iterations, after which the time will increase noticeably (essentially exponentially, hence ultimately freezes up). If
2006 Jan 25
1
imbalanced classes
Hi Andy, I know this topic has been discussed before on the R-help, but I was wondering if you could offer some advice specific to my application. I'm using the R random forest package to compare two classes of data, the number of cases in each class relatively low, 28 in class 1 and 9 in class 2. I'd really like to use R environment to analyze this data, however I'm finding it
2007 Apr 29
1
randomForest gives different results for formula call v. x, y methods. Why?
Just out of curiosity, I took the default "iris" example in the RF helpfile... but seeing the admonition against using the formula interface for large data sets, I wanted to play around a bit to see how the various options affected the output. Found something interesting I couldn't find documentation for... Just like the example... > set.seed(12) # to be sure I have
2007 Mar 23
1
memory, speed, and assigning results into new v. existing variable
I have a very large data frame, and I'm doing a conversion of all columns into factors. Takes a while (thanks to folks here though, for making faster!), but am wondering about optimization from a memory perspective... Internally, am I better off assigning into a new data frame, or doing one of these: dataframe<-someoperation(dataframe) It would seem that re-assigning into the same data
2005 Aug 10
2
Creating new columns inside a loop
Ok, I know R isn't an optimal environment for looping (or so I've heard) but I have a need to loop through columns of data and create new columns of data based on calculations within rows... I'm sure there's a help file, but I'm not sure what search terms to use to find it! The problem is that these new columns need to have names that I can later access... Like NewVar1,
2008 Mar 09
1
sampsize in Random Forests
Hi all, I have a dataset where each point is assigned to a class A, B, C, or D. Each point is also assigned to a study site. Each study site is coded with a number ranging between 1-100. This information is stored in the vector studySites. I want to run randomForests using stratified sampling, so I chose the option strata = factor(studySites) But I am not sure how to control the number of
2007 Mar 15
2
replacing all NA's in a dataframe with zeros...
I've seen how to replace the NA's in a single column with a data frame *> mydata$ncigs[is.na(mydata$ncigs)]<-0 *But this is just one column... I have thousands of columns (!) that I need to do this, and I can't figure out a way, outside of the dreaded loop, do replace all NA's in an entire data frame (all vars) without naming each var separately. Yikes. I'm racking my
2005 Sep 13
1
Anyone have any code for importing data from NAMCS?
The National Ambulatory and Medical Care Survey is a free data set from the CDC that I'd like to analyze using the "Survey" package in R. Before I dive in, though, it occurred to me that someone may already have gone to the trouble of writing code that will bring in the data and assign the variable names and value labels. This is a big file, so doing it from scratch will take
2010 Nov 21
1
abline(h=whatever) not working in candleChart() (in quantmod)?
Hello, all-- I am having some fun playing with the graphing in quantmod-- very nice! I am writing a function to calculate (and hopefully plot) support and resistance lines, but the usual plot call of "abline(h=value)" does not seem to work. Here's my code: require(quantmod) AAPL<-getYahooData("AAPL") candleChart(AAPL,subset="last 3
2005 May 15
1
Not sure if this is "aggregate" or some other task.
I have data where where I've taken some measurements three times... twice in rapid succession so I could check test-retest reliability of a piece of equipment, and then a third measurement some time later. Not I'd like to do an analysis where I have two scores... the first being the mean of the first two taken the same day, and the second being the one taken later. I have a lot of
2005 Nov 23
2
TryCatch() with read.csv("http://...")
Hi, folks! I'm trying to pull in data using read.csv("my URL goes here"), and it really works fantastically. Amazing to pull in live data right off the internet, into RAM, and get busy... however... occasionally there is a server problem, or the data are not up yet, and instead of pushing through a nice CSV file, the server sends a 404 "Not Found" page... Since the
2006 May 23
1
Survey proportions... Can I use population as denominator?
Just giving the survey package a spin... I'm accustomed to stata, and it seems very similar in many respects. One thing is throwing me, however. I've gotten my data in, and specified the design. Looks like the weighting is right (based on published population estimates from these data), but now I'd like to check my "marginal means" for proportions against those that have
2005 Oct 09
1
Insert value from same column of another row (lag across observations)
I know I've done this before, but it's been a while and I can't find quite what I need in the help files or archives. I have a text field in a very large data frame. I'd like to add a column that represents the value from an existing field, from the next record (the data are sorted). I'm trying to represent "what happens tomorrow", so the "today" row would
2005 Nov 07
4
R seems to "stall" after several hours on a long series of analyses... where to start?
Not sure where to even start on this.... I'm hoping there's some debugging I can do... I have a loop that cycles through several different data sets (same structure, different info), performing randomForest growth and predictions... saving out the predictions for later study... I get about 5 hours in (9%... of the planned iterations.. yikes!) and R just freezes. This happens in