2011 May 12
Can ROC be used as a metric for optimal model selection for randomForest?
Dear all, I am using the "caret" Package for predictors selection with a randomForest model. The following is the train function: rfFit<- train(x=trainRatios, y=trainClass, method="rf", importance = TRUE, do.trace = 100, keep.inbag = TRUE, tuneGrid = grid, trControl=bootControl, scale = TRUE, metric = "ROC") I wanted to use ROC as the metric for variable
2013 Mar 06
CARET and NNET fail to train a model when the input is high dimensional
The following code fails to train a nnet model in a random dataset using caret: nR <- 700 nCol <- 2000 myCtrl <- trainControl(method="cv", number=3, preProcOptions=NULL, classProbs = TRUE, summaryFunction = twoClassSummary) trX <- data.frame(replicate(nR, rnorm(nCol))) trY <- runif(1)*trX[,1]*trX[,2]^2+runif(1)*trX[,3]/trX[,4] trY <-
2010 Sep 29
caret package version 4.63
Version 4.63 of the caret package is now on CRAN. caret can be used to tune the parameters of predictive models using resampling, estimate variable importance and visualize the results. There are also various modeling and "helper" functions that can be useful for training models. caret has wrappers to over 99 different models for classification and regression. See the package vignettes
2010 Sep 29
2011 Dec 22
randomforest and AUC using 10 fold CV - Plotting results
Here is a snippet to show what i'm trying to do. library(randomForest) library(ROCR) library(caret) data(iris) iris <- iris[(iris$Species != "setosa"),] fit <- randomForest(factor(Species) ~ ., data=iris, ntree=50) train.predict <- predict(fit,iris,type="prob")[,2]
2013 Feb 10
Training with very few positives
I have a binary classification problem where the fraction of positives is very low, e.g. 20 positives in 10,000 examples (0.2%) What is an appropriate cross validation scheme for training a classifier with very few positives? I currently have the following setup: ======================================== library(caret) tmp <- createDataPartition(Y, p = 9/10, times = 3, list = TRUE)
2010 Oct 22
Random Forest AUC
Guys, I used Random Forest with a couple of data sets I had to predict for binary response. In all the cases, the AUC of the training set is coming to be 1. Is this always the case with random forests? Can someone please clarify this? I have given a simple example, first using logistic regression and then using random forests to explain the problem. AUC of the random forest is coming out to be
2013 Nov 15
Inconsistent results between caret+kernlab versions
I'm using caret to assess classifier performance (and it's great!). However, I've found that my results differ between R2.* and R3.* - reported accuracies are reduced dramatically. I suspect that a code change to kernlab ksvm may be responsible (see version 5.16-24 here: I get very different results between caret_5.15-61 +