Displaying 20 results from an estimated 1000 matches similar to: "Random Forest with highly imbalanced data"
2006 Jan 25
1
imbalanced classes
Hi Andy,
I know this topic has been discussed before on the R-help, but I was
wondering if you could offer some advice specific to my application.
I'm using the R random forest package to compare two classes of data,
the number of cases in each class relatively low, 28 in class 1 and 9
in class 2. I'd really like to use R environment to analyze this data,
however I'm finding it
2011 Sep 13
1
class weights with Random Forest
Hi All,
I am looking for a reference that explains how the randomForest function in
the randomForest package uses the classwt parameter. Here:
http://tolstoy.newcastle.edu.au/R/e4/help/08/05/12088.html
Andy Liaw suggests not using classwt. And according to:
http://r.789695.n4.nabble.com/R-help-with-RandomForest-classwt-option-td817149.html
it has "not been implemented" as of 2007.
2007 Jan 28
2
help with RandomForest classwt option
Hello there,
I am working on an extremely unbalanced two class classification problems. I
wanna use "classwt" with "down sampling" together. By checking the rfNews()
in R, it looks that classwt is not working yet. Then I looked at the
software from Salford. I did not find the down sampling option. I am
wondering if you have any experience to deal with this problem. Do you
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?
"classwt" in the current version of the randomForest package doesn't work
too well. (It's what was in version 3.x of the original Fortran code by
Breiman and Cutler, not the one in the new Fortran code.) I'd advise
against using it.
"sampsize" and "strata" can be use in conjunction. If "strata" is not
specified, the class labels will be used.
2004 Jan 20
1
random forest question
Hi,
here are three results of random forest (version 4.0-1).
The results seem to be more or less the same which is strange because I
changed the classwt.
I hoped that for example classwt=c(0.45,0.1,0.45) would result in fewer
cases classified as class 2. Did I understand something wrong?
Christian
x1rf <- randomForest(x=as.data.frame(mfilters[cvtrain,]),
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" in randomForest?
Sorry for the repost, but I've really been looking, and can't find any
syntax direction on this issue...
Just browsing the documentation, and searching the list came up short... I
have some unbalanced data and was wondering if, in a "0" v "1"
classification forest, some combo of these options might yield better
predictions when the proportion of one class is low (less
2006 Feb 06
1
Classification of Imbalanced Data
Hi,
I'm looking to perform a classification analysis on an imbalanced data
set using random Forest and I'd like to reproduce the weighted random
forest analysis proposed in the Chen, Liaw & Breiman paper "Using Random
Forest to Learn Imbalanced Data"; can I use the R package randomForest
to perform such analysis? What is the easiest way to accomplish this task?
Thanks,
2004 Nov 04
4
highly biased PCA data?
Hello, supposing that I have two or three clear categories for my data,
lets say pet preferece across fish, cat, dog. Lets say most people rate
their preference as being mostly one of the categories.
I want to do pca on the data to see three 'groups' of people, one group
for fish, one for cat and one for dog. I would like to see the odd person
who likes both or all three in the
2010 May 05
1
What is the default nPerm for regression in randomForest?
Could not find it in ?randomForest.
Thank you for your help!
--
Dimitri Liakhovitski
Ninah.com
Dimitri.Liakhovitski at ninah.com
2006 Nov 13
1
random forest regression
Dear all,
I am doing a regression in ramdomForest, using the option "sampsize" reduce
the number of records used to produce the randomForest object.
The manual says "For classification, if sampsize is a vector of the length
the number of strata, then sampling is stratified by strata, and the
elements of sampsize indicate the numbers to be drawn from the strata". I
need my
2008 Mar 09
1
sampsize in Random Forests
Hi all,
I have a dataset where each point is assigned to a class A, B, C, or
D. Each point is also assigned to a study site. Each study site is
coded with a number ranging between 1-100. This information is stored
in the vector studySites.
I want to run randomForests using stratified sampling, so I chose the option
strata = factor(studySites)
But I am not sure how to control the number of
2008 May 21
1
How to use classwt parameter option in RandomForest
Hi,
I am trying to model a dataset with the response variable Y, which has
6 levels { Great, Greater, Greatest, Weak, Weaker, Weakest}, and
predictor variables X, with continuous and factor variables using
random forests in R. The variable Y acts like an ordinal variable, but
I recoded it as factor variable.
I ran a simulation and got OOB estimate of error rate 60%. I validated
against some
2013 May 14
0
need help for Imbalanced classification problems!!!
Hi all,
I am facing the imbalanced classification problems. That means I have a
dataset, in which the ratio of majority data to minority data is 100:1 (or
more).
In addition, the independent variables are many and this is a binary
classification questions.
The model I built give poor predictive power for minor data, but for the
majority data the predictivity seems to overfitting.
Could you
2002 Sep 25
5
CART vs. Random Forest
According to Dr. Breiman, the RF should be more accurate
method than a single tree. However, the performance of each
method seems to depend on the proprotion of outcome variable
in my case. My data set is a typical classification problem
(predict bad guys). When I ran both of them with different
proportion of outcome variables(there's a criterion to measure
the degree of bad behavior), I
2010 Jul 20
1
Random Forest - Strata
Hi all,
Had struggled in getting "Strata" in randomForest to work on this.
Can I get randomForest for each of its TREE, to get ALL sample from some
strata to build tree, while leaving some strata TOTALLY untouched as oob?
e.g. in below, how I can tell RF to,
- for tree 1 in the forest, to use only Site A and B to build the tree,
while using the WHOLE Site C data for the oob error
2005 Oct 25
0
Examples of "classwt", "strata", and "sampsize" in randomForest?
Just browsing the documentation, and searching the list came up short... I
have some unbalance data and was wondering if, in a "0" v "1" classification
forest, if these options might yield better predictions when the proportion
of one class is low (less than 10% in a sample of 2,000 observations).
Not sure how to specify these terms... from the docs, we have:
classwt: Priors
2011 Nov 01
1
Sample size calculations for one sided binomial exact test
I'm trying to compute sample size requirements for a binomial exact test.
we want to show that the proportion is at least 90% assuming that it is
95%, with 80% power so any asymptotic approximations are out of the
questions. I was planning on using binom.test to perform the simple test
against a prespecified value, but cannot find any functions for computing
sample size. do any exist?
2012 Mar 03
0
Strategies to deal with unbalanced classification data in randomForest
Hello all,
I have become somewhat confused with options available for dealing
with a highly unbalanced data set (10000 in one class, 50 in the
other). As a summary I am unsure:
a) if I am perform the two class weighting methods properly,
b) if the data are too unbalanced and that this type of analysis is
appropriate and
c) if there is any interaction between the weighting for class
imbalances
2006 Mar 29
2
missing value replacement for test data in random forest
Hi,
In R, how to do missing value replacement for test data in randome forest in the way Breiman decribed.
thanks in advance
iris
2009 Mar 20
2
randomForest
Hi!
I am dealing with random forest using R.
Is there a way to sample a fixed no.of rows from a dataset for use with
different trees in random Forest.
To be more clear, my data set contains 1500 rows, and I am growing 500 trees
in Random Forest
Is it possible to sample only 500 rows of data from the data set and use it
for different trees in the forest. I mean each tree of the forest should use