thr3ads.net - similar to: "Cost-sensitive classification"

Displaying 20 results from an estimated 20000 matches similar to: "Cost-sensitive classification"

gbm for cost-sensitive binary classification?

2009 Jun 17

gbm for cost-sensitive binary classification?

I recently use gbm for a binary classification problem. As expected, it gets very good results, based on Area under ROC with 7-fold cross validation. However, the application (malware detection) is cost-sensitive, getting a FP (classify a clean sample as a dirty one) is much worse than getting a FN (miss a dirty sample). I would like to tune the gbm model biased to very low FP rate. For this

rpart v. lda classification.

2003 Feb 12

rpart v. lda classification.

I've been groping my way through a classification/discrimination problem, from a consulting client. There are 26 observations, with 4 possible categories and 24 (!!!) potential predictor variables. I tried using lda() on the first 7 predictor variables and got 24 of the 26 observations correctly classified. (Training and testing both on the complete data set --- just to get started.) I

Collapsing solution to the question discussed above: Re: multi-class classification using rpart

2005 Jan 25

Collapsing solution to the question discussed above: Re: multi-class classification using rpart

You could break your 3 class problem into several (2 or 3) 2 class problems, and then use Andy's suggestion (see the CART book). There are several ways to break the problem into 2 class problems, and several ways to combine the resulting classifiers. Tom Dietterich, Jerry Friedman, Trevor Hastie and Rob Tibshirani, among others, have articles on the question, in places like Annals of

How to show which variables include in plot of classification tree

2005 Mar 18

How to show which variables include in plot of classification tree

Dear all For my research, I am learning classification now. I was trying some example about classification tree pakages, such as tree and rpart, for instance, in Pima.te dataset have 8 variables (include class=type): library(rpart) library(datasets) pima.rpart <- rpart(type ~ npreg+glu+bp+skin+bmi+ped+age,data=Pima.te, method='class') plot(pima.rpart, uniform=TRUE) text(pima.rpart)

Decision tree model using rpart ( classification

2011 Nov 04

Decision tree model using rpart ( classification

Hi Experts, I am new to R, using decision tree model for getting segmentation rules. A) Using behavioural data (attributes defining customer behaviour, ( example balances, number of accounts etc.) 1. Clustering: Cluster behavioural data to suitable number of clusters 2. Decision Tree: Using rpart classification tree for generating rules for segmentation using cluster number(cluster id) as target

nnet classification accuracy vs. other models

2004 Mar 13

nnet classification accuracy vs. other models

I was wandering if anybody ever tried to compare the classification accuracy of nnet to other (rpart, tree, bagging) models. From what I know, there is no reason to expect a significant difference in classification accuracy between these models, yet in my particular case I get about 10% error rate for tree, rpart and bagging model and 80% error rate for nnet, applied to the same data. Thanks.

rpart: which is correct?

2009 Aug 02

rpart: which is correct?

I am using rpart in classification mode and am confused about this particular model's predictions. > predict(fit, train[8,]) -1 1 8 0.5974089 0.4025911 > predict(fit, train[8,], type="class") 1 Levels: -1 1 So, it seems like there is a 60% change of being class -1 according the the "prob" output (which is the default for classification) but gives

Classification problem - rpart

2003 Apr 10

Classification problem - rpart

I am performing a binary classification using a classification tree. Ironically, the data themselves are 2483 tree (real biological ones) locations as described by a suite of environmental variables (slope, soil moisture, radiation load, etc). I want to separate them from an equal number of random points. Doing eda on the data shows that there is substantial difference between the tree and random

Classification tree with a random variable

2006 Aug 24

Classification tree with a random variable

Hi, I am planning on using classification trees to build a predictive model for data which includes a random variable. I intend to use the R functions 'rpart' (and potentially also 'randomForest' and 'bagging'). I have a data set with 390 data points. The response variable is binary. There are a large number of variables (>20, both categorical and continuous). The

Predicting classification error from rpart

2005 Oct 14

Predicting classification error from rpart

Hi, I think I'm missing something very obvious, but I am missing it, so I would be very grateful for help. I'm using rpart to analyse data on skull base morphology, essentially predicting sex from one or several skull base measurements. The sex of the people whose skulls are being studied is known, and lives as a factor (M,F) in the data. I want to get back predictions of gender, and

Rpart and bagging - how is it done?

2008 Mar 06

Rpart and bagging - how is it done?

Hi there. I was wondering if somebody knows how to perform a bagging procedure on a classification tree without running the classifier with weights. Let me first explain why I need this and then give some details of what I have found out so far. I am thinking about implementing the bagging procedure in Matlab. Matlab has a simple classification tree function (in their Statistics toolbox) but

p-values for classification

2005 Jul 01

p-values for classification

Dear All, I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points. I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for each classifier. I was thinking about

R classification

2011 Oct 19

R classification

hello, i am so glad to write you. i am dealing now with writing my M.Sc in Applied Statistics thesis, titled " Data Mining Classifiers and Predictive Models Validation and Evaluation". I am planning to compare several DM classifiers like "NN, kNN, SVM, Dtree, and Naïve Bayes" according to their Predicting accuracy, interpretability, scalability, and time consuming etc. I have

Changing the classification threshold for cost function

2012 Aug 02

Changing the classification threshold for cost function

Dear All I am trying to perform leave-one-out cross validation on a logistic regression model using cv.glm from the boot package in R. As I understand it, the standard cost function: cost<-function(r,pi=0) mean(abs(r-pi)>0.5) Uses a 50% risk threshold to classify cases as positive or negative and calculates the prediction error based on this. I would like to alter this threshold to,

Classification trees problem.

2011 Aug 08

Classification trees problem.

Hello Everyone, I'm doing a Classification trees with categorical explanatory variables using library rpart and I would like to do a prediction for some data imputs. I don't know where's a function or how can I do it?. Is there someone can help ?? ¿. Here's the code that I'm using. library(rpart) fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis) plot(fit)

rpart - how to estimate the “meaningful” predictors for an outcome (in classification trees)

2010 Dec 14

rpart - how to estimate the “meaningful” predictors for an outcome (in classification trees)

Hi dear R-help memebers, When building a CART model (specifically classification tree) using rpart, it is sometimes obvious that there are variables (X's) that are meaningful for predicting some of the outcome (y) variables - while other predictors are relevant for other outcome variables (y's only). *How can it be estimated, which explanatory variable is "used" for which of

Classification error rate increased by bagging - any ideas?

2006 Jul 18

Classification error rate increased by bagging - any ideas?

Hi, I'm analysing some anthropometric data on fifty odd skull bases. We know the gender of each skull, and we are trying to develop a predictor to identify the sex of unknown skulls. Rpart with cross-validation produces two models - one of which predicts gender for Males well, and Females poorly, and the other does the opposite (Females well, and Males poorly). In both cases the error

comparing random forests and classification trees

2007 Jan 29

comparing random forests and classification trees

Hi, I have done an analysis using 'rpart' to construct a Classification Tree. I am wanting to retain the output in tree form so that it is easily interpretable. However, I am wanting to compare the 'accuracy' of the tree to a Random Forest to estimate how much predictive ability is lost by using one simple tree. My understanding is that the error automatically displayed by the two

rpart error with 0-frequency factor levels (with partial fix) (PR#1378)

2002 Mar 13

rpart error with 0-frequency factor levels (with partial fix) (PR#1378)

(I'm sending to r-bugs because rpart is one of the recommended packages and is always installed. I'm also sending it directly to Dr. Ripley, as the maintainer.) rpart working as a classifier does not work (produces no splits) when the class indicator has no instances of one of the factor levels, as long as the factor level is not the final level. I have at least a partial fix, which I

About classification methods.

2011 Feb 11

About classification methods.

Dear R users, I'm new of the R, I really don't know much. I want classification some data (two class, many features and huge size of data) by using R. At this case, I want using Support Vector Machine, Bayes theory based classifier, Discriminant Analysis, Regression based at least. Which package should I using, and can I compare each classifier result by predictions? Thank you.

similar to: Cost-sensitive classification