similar to: hands-on classification tutorial needed...

Displaying 20 results from an estimated 7000 matches similar to: "hands-on classification tutorial needed..."

2006 Feb 06
1
Classification of Imbalanced Data
Hi, I'm looking to perform a classification analysis on an imbalanced data set using random Forest and I'd like to reproduce the weighted random forest analysis proposed in the Chen, Liaw & Breiman paper "Using Random Forest to Learn Imbalanced Data"; can I use the R package randomForest to perform such analysis? What is the easiest way to accomplish this task? Thanks,
2011 Jan 24
5
Train error:: subscript out of bonds
Hi, I am trying to construct a svmpoly model using the "caret" package (please see code below). Using the same data, without changing any setting, I am just changing the seed value. Sometimes it constructs the model successfully, and sometimes I get an ?Error in indexes[[j]] : subscript out of bounds?. For example when I set seed to 357 following code produced result only for 8
2012 Nov 29
1
Help with this error "kernlab class probability calculations failed; returning NAs"
I have never been able to get class probabilities to work and I am relatively new to using these tools, and I am looking for some insight as to what may be wrong. I am using caret with kernlab/ksvm. I will simplify my problem to a basic data set which produces the same problem. I have read the caret vignettes as well as documentation for ?train. I appreciate any direction you can give. I
2009 Jun 24
2
[Classification] lifting score in R
Hi all, Could anybody give me some pointers to Cross Validation using Lifting Score as error function, as commonly used in data-mining and classification field in marketing and e-commerce research? Thanks!
2010 Nov 23
5
cross validation using e1071:SVM
Hi everyone I am trying to do cross validation (10 fold CV) by using e1071:svm method. I know that there is an option (?cross?) for cross validation but still I wanted to make a function to Generate cross-validation indices using pls: cvsegments method. ##################################################################### Code (at the end) Is working fine but sometime caret:confusionMatrix
2009 Jun 17
1
gbm for cost-sensitive binary classification?
I recently use gbm for a binary classification problem. As expected, it gets very good results, based on Area under ROC with 7-fold cross validation. However, the application (malware detection) is cost-sensitive, getting a FP (classify a clean sample as a dirty one) is much worse than getting a FN (miss a dirty sample). I would like to tune the gbm model biased to very low FP rate. For this
2011 Apr 29
6
Bigining with a Program of SVR
Hi: I'm starting a research of Support Vector Regression. I want to obtain a model to predict a property A with a set of property B, C, D, ... This problem is very common for example in QSAR models. I want to know some examples and package that could help me in this way. I know about caret and e1071. But I' don't know if this package can work with continues variables.?
2009 Jun 19
2
good boosting tutorial and package in R?
Hi all, Could you please give me some pointers about what's the best boosting package in R currently? in terms of classification accuracy? And any pointers about tutorials and study-materials to curb the learning curve will be greatly appreciated! Thank you! p.s. Does anybody happen to know Boosting implemented in other language such as Matlab? Are they good in terms of accuracy? What
2012 Nov 23
1
caret train and trainControl
I am used to packages like e1071 where you have a tune step and then pass your tunings to train. It seems with caret, tuning and training are both handled by train. I am using train and trainControl to find my hyper parameters like so: MyTrainControl=trainControl( method = "cv", number=5, returnResamp = "all", classProbs = TRUE ) rbfSVM <- train(label~., data =
2009 May 15
3
Using sample to create Training and Test sets
Forgive the newbie question, I want to select random rows from my data.frame to create a test set (which I can do) but then I want to create a training set using whats left over. Example code: acc <- read.table("accOUT.txt", header=T, sep = ",", row.names=1) #select 400 random rows in data training <- acc[sample(1:nrow(acc), 400, replace=TRUE),] #try to get whats left
2009 Jun 19
3
please recommend hands-on books on classification, data-mining and machine learning with R?
Hi all, Could anybody please recommend some hands-on books on classification, data-mining and machine learning with R? I would like to get a very good understanding of the statistical tools that are used in these areas, while reducing the learning curve. Thank you!
2013 May 14
0
need help for Imbalanced classification problems!!!
Hi all, I am facing the imbalanced classification problems. That means I have a dataset, in which the ratio of majority data to minority data is 100:1 (or more). In addition, the independent variables are many and this is a binary classification questions. The model I built give poor predictive power for minor data, but for the majority data the predictivity seems to overfitting. Could you
2012 Oct 14
1
Is there any R package that contains Rusboost based on Adaboost.m2?
Hi, I have been searching everywhere for an implementation of those algorithms, but I have only observed them in Matlab and on the literature. I noticed a package called 'ada' in CRAN but it is not for multi class. I would be happy with just Adaboost.m2, Smoteboost over adaboost.m2 or any other combination that could account for imbalanced multiclass classification problems. Thanks!
2010 May 21
1
Question regarding GBM package
Dear R expert I have come across the GBM package for R and it seemed appropriate for my research. I am trying to predict the number of FPGA resources required by a Software Function if it were mapped onto hardware. As input I use software metrics (a lot of them). I already use several regression techniques, and the graphs I produce with GBM look promising. Now my question... I see that the
2010 Jun 08
2
cross-validation
Hi   I want to do leave-one-out cross-validation for multinomial logistic regression in R. I did multinomial logistic reg. by package nnet in R. How I do validation? by which function? response variable has 7 levels   please help me   Thanks alot Azam [[alternative HTML version deleted]]
2007 Dec 14
2
train nnet
Hi R-helpers, Can some one tell me how to train 'mynn' of this type?: mynn <- nnet(y ~ x1 + ..+ x8, data = lgist, size = 2, rang = 0.1, decay = 5e-4, maxit = 200) I assume that this nn is untrained, and to train I have to split the original data into train:test data set, do leave-one-out refitting to refine the weights (please straighten this up if I was wrong). I just don't know
2005 Jul 25
1
cluster
Dear listers: Here I have a question on clustering methods available in R. I am trying to down-sampling the majority class in a classification problem on an imbalanced dataset. Since I don't want to lose information in the original dataset, I don't want to use naive down-sampling: I think using clustering on the majority class' side to select "representative" samples might
2008 Sep 18
1
caret package: arguments passed to the classification or regression routine
Hi, I am having problems passing arguments to method="gbm" using the train() function. I would like to train gbm using the laplace distribution or the quantile distribution. here is the code I used and the error: gbm.test <- train(x.enet, y.matrix[,7], method="gbm", distribution=list(name="quantile",alpha=0.5), verbose=FALSE,
2020 Oct 23
5
How to shade area between lines in ggplot2
Hello, I am running SVM and showing the results with ggplot2. The results include the decision boundaries, which are two dashed lines parallel to a solid line. I would like to remove the dashed lines and use a shaded area instead. How can I do that? Here is the code I wrote.. ``` library(e1071) library(ggplot2) set.seed(100) x1 = rnorm(100, mean = 0.2, sd = 0.1) y1 = rnorm(100, mean = 0.7, sd =
2005 Jul 01
1
p-values for classification
Dear All, I'm classifying some data with various methods (binary classification). I'm interpreting the results via a confusion matrix from which I calculate the sensitifity and the fdr. The classifiers are trained on 575 data points and my test set has 50 data points. I'd like to calculate p-values for obtaining <=fdr and >=sensitifity for each classifier. I was thinking about