Hi everybody, I am new in e1071 and with SVMs. I am trying to understand the performance of SVMs but I face with a situation that I thought as not meaningful. I added the R code for you to see what I have done. /set.seed(1234) data <- data.frame( rbind(matrix(rnorm(1500, mean = 10, sd = 5),ncol = 10), matrix(rnorm(1500, mean = 5, sd = 5),ncol = 10))) class <- as.factor(rep(1:2, each=150)) data<- cbind(data,class) tuned<-best.svm(class~., data=data, kernel = "linear", cost seq(0.24,0.44, by = .01), tunecontrol=tune.control(cross=300) ) # test with train data predicts <- predict(model, data, probability=TRUE, decision.values = TRUE) tab<-table(predicts, data$class) tab/ This is what I face: /Parameters: SVM-Type: C-classification SVM-Kernel: linear cost: 0.26 gamma: 0.1 Number of Support Vectors: 61/ But, when I try cost=0.31, I get a lower misclassification error rate than when I get with cost=0.26 . Is this difference because the error used while tuning is different from the misclassification value? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/e1071-tuning-is-not-giving-the-best-within-the-range-tp4640747.html Sent from the R help mailing list archive at Nabble.com.
Mark Leeds
2012-Aug-19 20:07 UTC
[R] e1071 - tuning is not giving the best within the range
Hi: I can't go into all the details ( Lutz Hamel has a very nice intro book for SVM's and I wouldn't do the details justice anyway ) but the objective function in an SVM is maximizing the margin ( think of the margin as the amount of seperation between the 2 classes in a 2 class problem ). The objective function includes a penalty for being wrong when classifying but the "wrongness" is defined by distance wrong not as a 0-1. So, yes, the objective function is not minimizing classification rate and the Cost parameter is a penalty for the how far a point can be in terms of it being on the wrong side of the hyperplane ( Not a 0-1 type cost ). I'm not sure what you meant when you said put in a cost of 0.31 but, if you sent that param into SVM and kept everything else the same and obtained better confusion matrix than the one the tuned SVM gives you, I think that's possible. Definitely try to get your hands on Lutz's book for a way better explanation. Or let's hope someone else chimes in. Mark On Sun, Aug 19, 2012 at 3:02 PM, delf <aysem_cyp@hotmail.com> wrote:> Hi everybody, > > I am new in e1071 and with SVMs. I am trying to understand the performance > of SVMs but I face with a situation that I thought as not meaningful. > > I added the R code for you to see what I have done. > > /set.seed(1234) > data <- data.frame( rbind(matrix(rnorm(1500, mean = 10, sd = 5),ncol = 10), > matrix(rnorm(1500, mean = 5, sd = 5),ncol = 10))) > class <- as.factor(rep(1:2, each=150)) > data<- cbind(data,class) > > tuned<-best.svm(class~., data=data, kernel = "linear", cost > seq(0.24,0.44, by = .01), tunecontrol=tune.control(cross=300) ) > > # test with train data > predicts <- predict(model, data, probability=TRUE, decision.values = TRUE) > tab<-table(predicts, data$class) > tab/ > > This is what I face: > /Parameters: > SVM-Type: C-classification > SVM-Kernel: linear > cost: 0.26 > gamma: 0.1 > > Number of Support Vectors: 61/ > > But, when I try cost=0.31, I get a lower misclassification error rate than > when I get with cost=0.26 . > > Is this difference because the error used while tuning is different from > the > misclassification value? > > Thanks in advance. > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/e1071-tuning-is-not-giving-the-best-within-the-range-tp4640747.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]