thr3ads.net - similar to: "Strategies to deal with unbalanced classification data in randomForest"

Displaying 20 results from an estimated 5000 matches similar to: "Strategies to deal with unbalanced classification data in randomForest"

Memory limits for MDSplot in randomForest package

2012 Mar 23

Memory limits for MDSplot in randomForest package

Hello, I am struggling to produce an MDS plot using the randomForest package with a moderately large data set. My data set has one categorical response variables, 7 predictor variables and just under 19000 observations. That means my proximity matrix is approximately 133000 by 133000 which is quite large. To train a random forest on this large a dataset I have to use my institutions high

[slightly OT] predict.randomForest and type=”prob”

2011 Feb 15

[slightly OT] predict.randomForest and type=”prob”

Dear all , I would like to use the function randomForest to predict the probability of relocation failure of a GPS collar as a function of several environmental variables x (both factor and numeric: slope, vegetation, etc.) on a given area. The response variable y is thus success (0)/failure(1) of the relocation, and the sampling unit is the pixel of a raster map. My aim is to build a map

new version of randomForest (4.0-7)

2004 Jan 12

new version of randomForest (4.0-7)

Dear R users, I've just released a new version of randomForest (available on CRAN now). This version contained quite a number of new features and bug fixes, compared to version prior to 4.0-x (and few more since 4.0-1). For those not familiar with randomForest, it's an ensemble classifier/regression tool. Please see http://www.math.usu.edu/~adele/forests/ for more detailed information,

new version of randomForest (4.0-7)

2004 Jan 12

new version of randomForest (4.0-7)

na.action in randomForest --- Summary

2003 Aug 05

na.action in randomForest --- Summary

A few days ago I asked whether there were options other than na.action=na.fail for the R port of Breiman?s randomForest; the function?s help page did not say anything about other options. I have since discovered that a pdf document called ?The randomForest Package? and made available by Andy Liaw (who made the tool available in R---thank you) does discuss an option. It is an implementation of

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

2005 Oct 27

Repost: Examples of "classwt", "strata", and "sampsize" i n randomForest?

"classwt" in the current version of the randomForest package doesn't work too well. (It's what was in version 3.x of the original Fortran code by Breiman and Cutler, not the one in the new Fortran code.) I'd advise against using it. "sampsize" and "strata" can be use in conjunction. If "strata" is not specified, the class labels will be used.

new version of randomForest

2002 Dec 17

new version of randomForest

A new version of the randomForest package is now available on CRAN. The DESCRIPTION is: Package: randomForest Title: Breiman's random forest for classification and regression Version: 3.4-1 Depends: R (>= 1.5.0) Author: Fortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener. Description: Classification and regression based on a forest of trees using

Interpretation of randomForest results

2005 Jan 18

Interpretation of randomForest results

> From: luk > > I got the following results when I run radomForest with below > commands: > > qair <- read.table("train10.dat", header = T) > oz.rf <- randomForest(LESION ~ ., data = qair, ntree = 220, > importance = TRUE) > print(oz.rf) > > Call: > randomForest.formula(x = LESION ~ ., data = qair, ntree = > 220, importance =

randomForest question [Broadcast]

2006 Jul 26

randomForest question [Broadcast]

When mtry is equal to total number of features, you just get regular bagging (in the R package -- Breiman & Cutler's Fortran code samples variable with replacement, so you can't do bagging with that). There are cases when bagging will do better than random feature selection (i.e., RF), even in simulated data, but I'd say not very often. HTH, Andy From: Arne.Muller at

randomForest 4.3-0 released

2004 Jul 08

randomForest 4.3-0 released

Dear all, Version 4.3-0 of the randomForest package is now available on CRAN (in source; binaries will follow in due course). There are some interface changes and a few new features, as well as bug fixes. For those who had used previous versions, the important things to note are: 1. there's a namespace now, and 2. some functions have been renamed. The list of changes since 4.0-7 (last

randomForest 4.3-0 released

2004 Jul 08

randomForest 4.3-0 released

Question on class 1, 2 output for RandomForest

2005 Mar 23

Question on class 1, 2 output for RandomForest

The `1' and `2' columns are the error rates within those classes. E.g., the last row of the `1' column should correspond to the class.error for "-", and the last row of the `2' column to the class.error for "+". (I would have thought that that should be fairly obvious, but I guess not. It mimics what Breiman and Cutler's Fortran code does.) I suspect

predict.randomForest

2004 Dec 10

predict.randomForest

I have a data.frame with a series of variables tagged to a binary response ('present'/'absent'). I am trying to use randomForest to predict present/absent in a second dataset. After a lot a fiddling (using two data frames, making sure data types are the same, lots of testing with data that works such as data(iris)) I've settled on combining all my data into one data.frame

Classification of Imbalanced Data

2006 Feb 06

Classification of Imbalanced Data

Hi, I'm looking to perform a classification analysis on an imbalanced data set using random Forest and I'd like to reproduce the weighted random forest analysis proposed in the Chen, Liaw & Breiman paper "Using Random Forest to Learn Imbalanced Data"; can I use the R package randomForest to perform such analysis? What is the easiest way to accomplish this task? Thanks,

rpart vs. randomForest

2003 Apr 12

rpart vs. randomForest

Greetings. I'm trying to determine whether to use rpart or randomForest for a classification tree. Has anybody tested efficacy formally? I've run both and the confusion matrix for rf beats rpart. I've looking at the rf help page and am unable to figure out how to extract the tree. But more than that I'm looking for a more comprehensive user's guide for randomForest including

Regarding variable importance in the randomForest package

2010 Mar 16

Regarding variable importance in the randomForest package

For anyone who is knowledgeable about the randomForest package in R, I have a question: When I look at the variable importance for data, I see that my response variable is included along with my predictor variables. That is, I am getting a MeanDecreaseGini for my response variable, and therefore it seems as though it is being treated as a predictor variable. my code (just in case it helps) :

randomForest crash?

2003 Apr 21

randomForest crash?

I am attempting to use randomForests to look for interesting genes in microarray data with 216genes, 2 classes and 52 samples. My data.frame is 52x217 with the last column, V217 being the class(1 or 2). When I try lung.rf <- randomForest(V217 ~ ., data=tlSA216cda, importance= TRUE, proximity = TRUE) the GUI crashes. I am running R-1.6.2 under windo$e98, and most

randomForest parameters for image classification

2010 Nov 09

randomForest parameters for image classification

I am implementing an image classification algorithm using the randomForest package. The training data consists of 31000+ training cases over 26 variables, plus one factor predictor variable (the training class). The main issue I am encountering is very low overall classification accuracy (a lot of confusion between classes). However, I know from other classifications (including a regular decision

ecological meaning of randomForest vegetation classification?

2007 Sep 05

ecological meaning of randomForest vegetation classification?

Hi, everyone, I haven't found anything similar in the forum, so here's my problem (I'm no expert in R nor statistics): I have a data set of 59.000 cases with 9 variables each (fractional coverage of 9 different plant types, such as deciduous broad-leaved temperate trees or evergreen tropical trees etc.), which was generated by a vegetation model. In order to evaluate the quality of

RandomForest vs. bayes & svm classification performance

2006 Jul 24

RandomForest vs. bayes & svm classification performance

Hi This is a question regarding classification performance using different methods. So far I've tried NaiveBayes (klaR package), svm (e1071) package and randomForest (randomForest). What has puzzled me is that randomForest seems to perform far better (32% classification error) than svm and NaiveBayes, which have similar classification errors (45%, 48% respectively). A similar difference in

similar to: Strategies to deal with unbalanced classification data in randomForest