Have you tried using Rprof to determine where time is being spent in
the current code? Have you looked at how much memory you are using?
Are you paging? Have you run with a size 'x', then '2x' then
'4x' to
see what the growth in both CPU time and memory usage is? This is
what I would do if I were trying to debug/optimize one of my scripts.
Before I would run something for a day, I would understand how the
processing time increases with the size of the input file so that I
would have an idea of how long to wait.
On Thu, Sep 30, 2010 at 1:40 PM, Guelman, Leo <leo.guelman at rbc.com>
wrote:> Dear users,
>
> I'm working on binary classification problem using Support Vector
> Machines (SVM). My objective is to train a series of SVM models on a
> grid of hyperparameters and then select those that maximize the AUC
> based on an independent validation sample.
>
> My attempted code is shown below. It runs well on "small" data
sets but
> when I use it on a slightly larger sample (e.g., my train data is
> composed of about 8,000 observations on each class and 21 inputs), it
> takes "forever" to run (more than 1 day already and still
running). I'm
> wondering if there's any way I can optimize this code. Thanks in
advance
> for any help.
>
> I'm using 64-bit R 2.11.1 on Win 7.
>
> ####Start Code####
>
> library(e1071)
> library(ROCR)
>
> ### Create grid of hyperparameters
>
> Gseq <- seq(-15,3,2); G <- rep(2, length(Gseq)); G <- G^Gseq
> Cseq <- seq(-5,13,2); C <- rep(2, length(Cseq)); C <- C^Cseq
> mygrid <- expand.grid(C=C, G=G)
>
> ### Train models
>
> svm.models <- ?lapply(1:nrow(mygrid), function(i) {
> ? ? ? ? ? ? ? ?svm(churn.form, data = mytraindata,
> ? ? ? ? ? ? ? ?method = "C-classification", kernel =
"radial",
> ? ? ? ? ? ? ? ?cost = mygrid[i,1], gamma = mygrid[i,2],
> probability=TRUE)
> ? ? ? ? ? ? ? ?})
>
> ### Predict on test set
>
> pred.step3 <- numeric(length(svm.models))
>
> for (i in 1:length(svm.models)) {
>
> pred.step1 <- predict(svm.models[[i]], myvaliddata, decision.values = F,
>
> ? ? ? ? ? ? ?probability=T)
>
> pred.step2 <-
> prediction(predictions=attr(pred.step1,"probabilities")[,1],
> labels=myvaliddata$churn)
>
> pred.step3[i] <- performance(pred.step2, "auc")@y.values[[1]]
>
> }
>
> pred.step3
>
> ####End Code####
>
>
> Thanks,
> Leo.
>
> _______________________________________________________________________
>
> This e-mail may be privileged and/or confidential, and the sender does not
waive
> any related rights and obligations. Any distribution, use or copying of
this e-mail or the information
> it contains by other than an intended recipient is unauthorized.
> If you received this e-mail in error, please advise me (by return e-mail or
otherwise) immediately.
>
> Ce courriel peut contenir des renseignements prot?g?s et confidentiels.
> L?exp?diteur ne renonce pas aux droits et obligations qui s?y rapportent.
> Toute diffusion, utilisation ou copie de ce courriel ou des renseignements
qu?il contient
> par une personne autre que le destinataire d?sign? est interdite.
> Si vous recevez ce courriel par erreur, veuillez m?en aviser imm?diatement,
> par retour de courriel ou par un autre moyen.
>
> ? ? ? ?[[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem that you are trying to solve?