thr3ads.net - similar to: "need help for Imbalanced classification problems!!!"

Displaying 20 results from an estimated 20000 matches similar to: "need help for Imbalanced classification problems!!!"

2006 Feb 06

Classification of Imbalanced Data

Hi, I'm looking to perform a classification analysis on an imbalanced data set using random Forest and I'd like to reproduce the weighted random forest analysis proposed in the Chen, Liaw & Breiman paper "Using Random Forest to Learn Imbalanced Data"; can I use the R package randomForest to perform such analysis? What is the easiest way to accomplish this task? Thanks,

hands-on classification tutorial needed...

2009 Jul 11

hands-on classification tutorial needed...

Hi all, I am doing binary classification and want to improve the classification results on imbalanced response data. Currently the performance is poor. Are there ways I could improve the performance? I could either try different classification methodologies, or try exploring the data more, and throwing away noisy data, and manipulate the data more before sending into the classifiers. I was

Strategies to deal with unbalanced classification data in randomForest

2012 Mar 03

Strategies to deal with unbalanced classification data in randomForest

Hello all, I have become somewhat confused with options available for dealing with a highly unbalanced data set (10000 in one class, 50 in the other). As a summary I am unsure: a) if I am perform the two class weighting methods properly, b) if the data are too unbalanced and that this type of analysis is appropriate and c) if there is any interaction between the weighting for class imbalances

gbm for cost-sensitive binary classification?

2009 Jun 17

gbm for cost-sensitive binary classification?

I recently use gbm for a binary classification problem. As expected, it gets very good results, based on Area under ROC with 7-fold cross validation. However, the application (malware detection) is cost-sensitive, getting a FP (classify a clean sample as a dirty one) is much worse than getting a FN (miss a dirty sample). I would like to tune the gbm model biased to very low FP rate. For this

cluster

2005 Jul 25

cluster

Dear listers: Here I have a question on clustering methods available in R. I am trying to down-sampling the majority class in a classification problem on an imbalanced dataset. Since I don't want to lose information in the original dataset, I don't want to use naive down-sampling: I think using clustering on the majority class' side to select "representative" samples might

randomForestSRC 2.9.0 is now available

2019 Apr 22

randomForestSRC 2.9.0 is now available

Dear useRs: It's been some time since we last sent out an announcement, so this one will cover more than just the last update. The latest release of randomForestSRC is now available on CRAN at: https://CRAN.R-project.org/package=randomForestSRC The GitHub repository, through which we prefer to receive bug reports, is at: https://github.com/kogalur/randomForestSRC If you do find issues,

randomForestSRC 2.9.0 is now available

2019 Apr 22

randomForestSRC 2.9.0 is now available

Random Forest with highly imbalanced data

2004 May 12

Random Forest with highly imbalanced data

Hi group, I am trying to do a RF with approx 250,000 cases. My objective is to determine the risk factors of a person being readmitted to hospital (response=1) or else (response=0). Only 10%, or 25,000 cases were readmitted. I've heard about down-sampling and class weight approach and am wondering if R can do it. Even some reference to articles will help. >From the statistical point

Interpretation of randomForest results

2005 Jan 18

Interpretation of randomForest results

> From: luk > > I got the following results when I run radomForest with below > commands: > > qair <- read.table("train10.dat", header = T) > oz.rf <- randomForest(LESION ~ ., data = qair, ntree = 220, > importance = TRUE) > print(oz.rf) > > Call: > randomForest.formula(x = LESION ~ ., data = qair, ntree = > 220, importance =

imbalanced classes

2006 Jan 25

imbalanced classes

Hi Andy, I know this topic has been discussed before on the R-help, but I was wondering if you could offer some advice specific to my application. I'm using the R random forest package to compare two classes of data, the number of cases in each class relatively low, 28 in class 1 and 9 in class 2. I'd really like to use R environment to analyze this data, however I'm finding it

question about SVM in e1071

2010 Jul 14

question about SVM in e1071

Hi, I have a question about the parameter C (cost) in svm function in e1071. I thought larger C is prone to overfitting than smaller C, and hence leads to more support vectors. However, using the Wisconsin breast cancer example on the link: http://planatscher.net/svmtut/svmtut.html I found that the largest cost have fewest support vectors, which is contrary to what I think. please see the scripts

Is there any R package that contains Rusboost based on Adaboost.m2?

2012 Oct 14

Is there any R package that contains Rusboost based on Adaboost.m2?

Hi, I have been searching everywhere for an implementation of those algorithms, but I have only observed them in Matlab and on the literature. I noticed a package called 'ada' in CRAN but it is not for multi class. I would be happy with just Adaboost.m2, Smoteboost over adaboost.m2 or any other combination that could account for imbalanced multiclass classification problems. Thanks!

How to know if a classification tree is predicitve or not?

2005 Apr 21

How to know if a classification tree is predicitve or not?

Hello, I would like to know how to know if a classification tree is predictive or not ? Is it sufficient to analyse results of cross validation? Thanks for your help Laure Maton

Classification tree with a random variable

2006 Aug 24

Classification tree with a random variable

Hi, I am planning on using classification trees to build a predictive model for data which includes a random variable. I intend to use the R functions 'rpart' (and potentially also 'randomForest' and 'bagging'). I have a data set with 390 data points. The response variable is binary. There are a large number of variables (>20, both categorical and continuous). The

Looking for packages to do Feature Selection and Classification

2006 Jan 04

Looking for packages to do Feature Selection and Classification

Hi All, Sorry if this is a repost (a quick browse didn't give me the answer). I wonder if there are packages that can do the feature selection and classification at the same time. For instance, I am using SVM to classify my samples, but it's easy to get overfitted if using all of the features. Thus, it is necessary to select "good" features to build an optimum hyperplane (?).

Course*** R/S+: Fundamentals and Programming Techniques - Princeton, March 1-2

2007 Feb 21

Course*** R/S+: Fundamentals and Programming Techniques - Princeton, March 1-2

XLSolutions Corporation is proud to announce our March 2007 R/S: Fundamentals and Programming Techniques - in Princeton March 1-2, 2007 : http://www.xlsolutions-corp.com/Rfund.htm This two-day beginner to intermediate R/S-plus course focuses on a broad spectrum of topics, from reading raw data to a comparison of R and S. We will learn the essentials of data manipulation, graphical

R classification

2011 Oct 19

R classification

hello, i am so glad to write you. i am dealing now with writing my M.Sc in Applied Statistics thesis, titled " Data Mining Classifiers and Predictive Models Validation and Evaluation". I am planning to compare several DM classifiers like "NN, kNN, SVM, Dtree, and Naïve Bayes" according to their Predicting accuracy, interpretability, scalability, and time consuming etc. I have

RandomForest vs. bayes & svm classification performance

2006 Jul 24

RandomForest vs. bayes & svm classification performance

Hi This is a question regarding classification performance using different methods. So far I've tried NaiveBayes (klaR package), svm (e1071) package and randomForest (randomForest). What has puzzled me is that randomForest seems to perform far better (32% classification error) than svm and NaiveBayes, which have similar classification errors (45%, 48% respectively). A similar difference in

R help-classification accuracy of DFA and RF using caret

2013 Nov 06

R help-classification accuracy of DFA and RF using caret

Hi, I am a graduate student applying published R scripts to compare the classification accuracy of 2 predictive models, one built using discriminant function analysis and one using random forests (webpage link for these scripts is provided below). The purpose of these models is to predict the biotic integrity of streams. Specifically, I am trying to compare the classification accuracy (i.e.,

random forests for R

2002 Apr 02

random forests for R

Hi all, There is now a package available on CRAN that provides an R interface to Leo Breiman's random forest classifier. Basically, random forest does the following: 1. Select ntree, the number of trees to grow, and mtry, a number no larger than number of variables. 2. For i = 1 to ntree: 3. Draw a bootstrap sample from the data. Call those not in the bootstrap sample the

similar to: need help for Imbalanced classification problems!!!