Dear All, I am trying to mine a small dataset. Admittedly, it is a bit odd since it is an example of multi-classification task where I have more than 300 different classes for about 600 observations. Having said that, the problem is not the output of my script, but the fact that it gets stuck, without an error message, when I use C5.0 and caret. I recycled another script of mine which never gave me any headache, so I do not know what is going on. The small training set can be downloaded from https://www.dropbox.com/s/4yseukqqvssvh63/training.csv?dl=0 whereas I paste my script at the end of the email. C5.0 without caret completes in seconds, so I must be making some mistakes with Caret. Any suggestion is appreciated. Lorenzo #################################################### library(caret) library(readr) library(C50) library(doMC) library(digest) train <- read_csv("training.csv") ncores <- 2 registerDoMC(cores = ncores) set.seed(123) shuffle <- sample(nrow(train)) train <- train[shuffle, ] train$productid <- as.character(train$productid) train$productid <- paste('fac', train$productid, sep='') train$productid <- as.factor(train$productid) train$State <- as.factor(train$State) train$category <- as.factor(train$category) train$unit <- as.factor(train$unit) for (i in seq(nrow(train))){ train$myname[i] <- digest(train$myname[i], algo='crc32') } train <- subset(train, select=-c(straincategory, description)) ### this completes quickly oneTree <- C5.0(productid ~ ., data = train, trials=10) c50Grid <- expand.grid(trials = c(10), model = c( "tree" ## ,"rules" ),winnow = c(## TRUE, FALSE )) tc <- trainControl(method = "repeatedCV", summaryFunction=mnLogLoss, number = 5, repeats = 5, verboseIter=TRUE, classProbs=TRUE) ### but this takes forever model <- train(productid~., data=train, method="C5.0", trControl=tc, metric="logLoss",## strata=train$donation, ## sampsize=rep(nmin, length(levels(train$donation))), ## control C5.0Control(fuzzyThreshold = T), maximize=FALSE, tuneGrid=c50Grid)