Displaying 20 results from an estimated 5000 matches similar to: "Strategies to deal with unbalanced classification data in randomForest"
2012 Mar 23
1
Memory limits for MDSplot in randomForest package
Hello,
I am struggling to produce an MDS plot using the randomForest package
with a moderately large data set. My data set has one categorical
response variables, 7 predictor variables and just under 19000
observations. That means my proximity matrix is approximately 133000
by 133000 which is quite large. To train a random forest on this large
a dataset I have to use my institutions high
2011 Feb 15
1
[slightly OT] predict.randomForest and type=”prob”
Dear all ,
I would like to use the function randomForest to predict the probability
of relocation failure of a GPS collar as a function of several
environmental variables x (both factor and numeric: slope, vegetation,
etc.) on a given area. The response variable y is thus success
(0)/failure(1) of the relocation, and the sampling unit is the pixel of
a raster map. My aim is to build a map
2004 Jan 12
0
new version of randomForest (4.0-7)
Dear R users,
I've just released a new version of randomForest (available on CRAN now).
This version contained quite a number of new features and bug fixes,
compared to version prior to 4.0-x (and few more since 4.0-1).
For those not familiar with randomForest, it's an ensemble
classifier/regression tool. Please see
http://www.math.usu.edu/~adele/forests/ for more detailed information,
2004 Jan 12
0
new version of randomForest (4.0-7)
Dear R users,
I've just released a new version of randomForest (available on CRAN now).
This version contained quite a number of new features and bug fixes,
compared to version prior to 4.0-x (and few more since 4.0-1).
For those not familiar with randomForest, it's an ensemble
classifier/regression tool. Please see
http://www.math.usu.edu/~adele/forests/ for more detailed information,
2003 Aug 05
1
na.action in randomForest --- Summary
A few days ago I asked whether there were options other than
na.action=na.fail for the R port of Breiman?s randomForest; the function?s
help page did not say anything about other options.
I have since discovered that a pdf document called ?The randomForest
Package? and made available by Andy Liaw (who made the tool available in
R---thank you) does discuss an option. It is an implementation of
2005 Oct 27
1
Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?
"classwt" in the current version of the randomForest package doesn't work
too well. (It's what was in version 3.x of the original Fortran code by
Breiman and Cutler, not the one in the new Fortran code.) I'd advise
against using it.
"sampsize" and "strata" can be use in conjunction. If "strata" is not
specified, the class labels will be used.
2002 Dec 17
0
new version of randomForest
A new version of the randomForest package is now available on CRAN. The
DESCRIPTION is:
Package: randomForest
Title: Breiman's random forest for classification and regression
Version: 3.4-1
Depends: R (>= 1.5.0)
Author: Fortran original by Leo Breiman and Adele Cutler, R port by Andy
Liaw and Matthew Wiener.
Description: Classification and regression based on a forest of trees using
2005 Jan 18
1
Interpretation of randomForest results
> From: luk
>
> I got the following results when I run radomForest with below
> commands:
>
> qair <- read.table("train10.dat", header = T)
> oz.rf <- randomForest(LESION ~ ., data = qair, ntree = 220,
> importance = TRUE)
> print(oz.rf)
>
> Call:
> randomForest.formula(x = LESION ~ ., data = qair, ntree =
> 220, importance =
2006 Jul 26
0
randomForest question [Broadcast]
When mtry is equal to total number of features, you just get regular bagging
(in the R package -- Breiman & Cutler's Fortran code samples variable with
replacement, so you can't do bagging with that). There are cases when
bagging will do better than random feature selection (i.e., RF), even in
simulated data, but I'd say not very often.
HTH,
Andy
From: Arne.Muller at
2004 Jul 08
0
randomForest 4.3-0 released
Dear all,
Version 4.3-0 of the randomForest package is now available on CRAN (in
source; binaries will follow in due course). There are some interface
changes and a few new features, as well as bug fixes. For those who had
used previous versions, the important things to note are: 1. there's a
namespace now, and 2. some functions have been renamed. The list of changes
since 4.0-7 (last
2004 Jul 08
0
randomForest 4.3-0 released
Dear all,
Version 4.3-0 of the randomForest package is now available on CRAN (in
source; binaries will follow in due course). There are some interface
changes and a few new features, as well as bug fixes. For those who had
used previous versions, the important things to note are: 1. there's a
namespace now, and 2. some functions have been renamed. The list of changes
since 4.0-7 (last
2005 Mar 23
0
Question on class 1, 2 output for RandomForest
The `1' and `2' columns are the error rates within those classes. E.g., the
last row of the `1' column should correspond to the class.error for "-", and
the last row of the `2' column to the class.error for "+". (I would
have thought that that should be fairly obvious, but I guess not. It mimics
what Breiman and Cutler's Fortran code does.) I suspect
2004 Dec 10
1
predict.randomForest
I have a data.frame with a series of variables tagged to a binary
response ('present'/'absent'). I am trying to use randomForest to
predict present/absent in a second dataset. After a lot a fiddling
(using two data frames, making sure data types are the same, lots of
testing with data that works such as data(iris)) I've settled on
combining all my data into one data.frame
2006 Feb 06
1
Classification of Imbalanced Data
Hi,
I'm looking to perform a classification analysis on an imbalanced data
set using random Forest and I'd like to reproduce the weighted random
forest analysis proposed in the Chen, Liaw & Breiman paper "Using Random
Forest to Learn Imbalanced Data"; can I use the R package randomForest
to perform such analysis? What is the easiest way to accomplish this task?
Thanks,
2003 Apr 12
5
rpart vs. randomForest
Greetings. I'm trying to determine whether to use rpart or randomForest
for a classification tree. Has anybody tested efficacy formally? I've
run both and the confusion matrix for rf beats rpart. I've looking at
the rf help page and am unable to figure out how to extract the tree.
But more than that I'm looking for a more comprehensive user's guide
for randomForest including
2010 Mar 16
1
Regarding variable importance in the randomForest package
For anyone who is knowledgeable about the randomForest package in R, I have
a question:
When I look at the variable importance for data, I see that my response
variable is included along with my predictor variables. That is, I am
getting a MeanDecreaseGini for my response variable, and therefore it seems
as though it is being treated as a predictor variable.
my code (just in case it helps) :
2003 Apr 21
2
randomForest crash?
I am attempting to use randomForests to look for interesting genes in
microarray data with 216genes, 2 classes and 52 samples. My data.frame
is 52x217 with the last column, V217 being the class(1 or 2).
When I try
lung.rf <- randomForest(V217 ~ ., data=tlSA216cda, importance=
TRUE, proximity = TRUE)
the GUI crashes.
I am running R-1.6.2 under windo$e98, and most
2010 Nov 09
1
randomForest parameters for image classification
I am implementing an image classification algorithm using the
randomForest package. The training data consists of 31000+ training
cases over 26 variables, plus one factor predictor variable (the
training class). The main issue I am encountering is very low overall
classification accuracy (a lot of confusion between classes). However, I
know from other classifications (including a regular decision
2007 Sep 05
1
ecological meaning of randomForest vegetation classification?
Hi, everyone,
I haven't found anything similar in the forum, so here's my problem (I'm no
expert in R nor statistics):
I have a data set of 59.000 cases with 9 variables each (fractional
coverage of 9 different plant types, such as deciduous broad-leaved
temperate trees or evergreen tropical trees etc.), which was generated by a
vegetation model.
In order to evaluate the quality of
2006 Jul 24
2
RandomForest vs. bayes & svm classification performance
Hi
This is a question regarding classification performance using different methods.
So far I've tried NaiveBayes (klaR package), svm (e1071) package and
randomForest (randomForest). What has puzzled me is that randomForest seems to
perform far better (32% classification error) than svm and NaiveBayes, which
have similar classification errors (45%, 48% respectively). A similar
difference in