thr3ads.net - R help - [R] Trouble with Caret and C5.0 [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Lorenzo Isella

2015-Aug-31 19:25 UTC

[R] Trouble with Caret and C5.0

Dear All,
I am trying to mine a small dataset.
Admittedly, it is a bit odd since it is an example of
multi-classification task where I have more than 300 different classes for about
600
observations.
Having said that, the problem is not the output of my script, but the
fact that it gets stuck, without an error message, when I use C5.0 and
caret.
I recycled another script of mine which never gave me any headache, so
I do not know what is going on.
The small training set can be downloaded from


https://www.dropbox.com/s/4yseukqqvssvh63/training.csv?dl=0


whereas I paste my script at the end of the email.
C5.0 without caret completes in seconds, so I must be making some
mistakes with Caret.
Any suggestion is appreciated.

Lorenzo

####################################################

library(caret)
library(readr)
library(C50)
library(doMC)
library(digest)


train <- read_csv("training.csv")

ncores <- 2


registerDoMC(cores = ncores)


set.seed(123)


shuffle <- sample(nrow(train))

train <- train[shuffle, ]


train$productid <- as.character(train$productid)

train$productid <- paste('fac', train$productid, sep='')

train$productid <- as.factor(train$productid)

train$State <- as.factor(train$State)

train$category <- as.factor(train$category)

train$unit <- as.factor(train$unit)

for (i in seq(nrow(train))){

train$myname[i] <- digest(train$myname[i], algo='crc32')

}


train <- subset(train, select=-c(straincategory, description))


### this completes quickly
oneTree <- C5.0(productid ~ ., data = train, trials=10)




c50Grid <- expand.grid(trials = c(10),
         model = c( "tree" ## ,"rules"
	                    ),winnow = c(## TRUE,
			                             FALSE ))




tc <- trainControl(method = "repeatedCV",
summaryFunction=mnLogLoss,
                   number = 5, repeats = 5, verboseIter=TRUE,
                   classProbs=TRUE)



### but this takes forever
model <- train(productid~., data=train, method="C5.0",
trControl=tc,
                              metric="logLoss",##
                              strata=train$donation,
			                     ## sampsize=rep(nmin,
                              length(levels(train$donation))),
			                     ## control                              
C5.0Control(fuzzyThreshold = T),
			                     maximize=FALSE,
                              tuneGrid=c50Grid)

R help - Aug 2015 - Trouble with Caret and C5.0

[R] Trouble with Caret and C5.0