Liaw, Andy
2006-Feb-02 16:28 UTC
[Rd] crossvalidation in svm regression in e1071 gives incorre ct results (PR#8554)
1. This is _not_ a bug in R itself. Please don't use R's bug reporting system for contributed packages. 2. This is _not_ a bug in svm() in `e1071'. I believe you forgot to take sqrt. 3. You really should use the `tot.MSE' component rather than the mean of the `MSE' component, but this is only a very small difference. So, instead of spread[i] <- mean(mysvm$MSE), you should have spread[i] <- sqrt(mysvm$tot.MSE). I get:> spread <- rep(0,20) > for (i in 1:20) {+ spread[i] <- svm(y ~ x,data, cross=10)$tot.MSE + }> summary(sqrt(spread[i]))Min. 1st Qu. Median Mean 3rd Qu. Max. 0.2679 0.2679 0.2679 0.2679 0.2679 0.2679 Andy From: no228 at cam.ac.uk> > Full_Name: Noel O'Boyle > Version: 2.1.0 > OS: Debian GNU/Linux Sarge > Submission from: (NULL) (131.111.8.96) > > > (1) Description of error > > The 10-fold CV option for the svm function in e1071 appears > to give incorrect > results for the rmse. > > The example code in (3) uses the example regression data in the svm > documentation. The rmse for internal prediction is 0.24. It > is expected the > 10-fold CV rmse should be bigger, but the result obtained > using the "cross=10" > option is 0.07. When the 10-fold CV is conducted either 'by > hand' (not shown > below) or using the errorest function in ipred (shown below) > the answer is > closer to 0.27, a more reasonable value. > > (2) Description of system > > I'm using the Debian Sarge version of R: > R : Copyright 2005, The R Foundation for Statistical Computing > Version 2.1.0 (2005-04-18), ISBN 3-900051-07-0 > > svm is in the e1071 package from CRAN: > Version: 1.5-11 > Date: 2005-09-19 > > (3) Example code illustrating the problem > > library(e1071) > > set.seed(42) > # create data > x <- seq(0.1, 5, by = 0.05) > y <- log(x) + rnorm(x, sd = 0.2) > data <- as.data.frame(cbind(y,x)) > > # estimate model and predict input values > mysvm <- svm(y ~ x,data) > result <- predict(mysvm, data) > (rmse <- sqrt(mean((result-data[,1])**2))) > # 0.2390489 > > # built-in 10-fold CV estimate of prediction error > spread <- rep(0,20) > for (i in 1:20) { > mysvm <- svm(y ~ x,data,cross=10) > spread[i] <- mean(mysvm$MSE) > } > summary(spread) > # Min. 1st Qu. Median Mean 3rd Qu. Max. > # 0.06789 0.07089 0.07236 0.07310 0.07411 0.08434 (or > something similar) > > # 10-fold CV using errorest > library(ipred) > mysvm <- function(formula,data) { > model <- svm(formula,data) > function(newdata) predict(model,newdata) > } > spread <- rep(0,20) > for (i in 1:20) { > spread[i] <- errorest(y ~ x, data, model=mysvm)$error > } > summary(spread) > # Min. 1st Qu. Median Mean 3rd Qu. Max. > # 0.2601 0.2649 0.2673 0.2696 0.2741 0.2927 > > > Regards, > Noel O'Boyle. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >