Hi all, I am using the "svm" command in the e1071 package. Does it have an automatic way of setting the "cost" parameter? I changed a few values for the "cost" parameter but I hope there is a systematic way of obtaining the best "cost" value. I noticed that there is a "cross" (Cross validation) parameter in the "svm" function. But I did not see how it can be used to optimize the "cost" parameter. By the way, what does a 0 training error and a high testing error mean? Varying "cross=5", or "cross=10", etc. does not change the training error and testing error at all. How to improve? Thanks a lot! M. [[alternative HTML version deleted]]
Liaw, Andy
2006-Feb-28 12:14 UTC
[R] does svm have a CV to obtain the best "cost" parameter?
From: Michael> > Hi all, > > I am using the "svm" command in the e1071 package. > > Does it have an automatic way of setting the "cost" parameter?See ?best.svm in that package.> I changed a few values for the "cost" parameter but I hope there is a > systematic way of obtaining the best "cost" value. > > I noticed that there is a "cross" (Cross validation) > parameter in the "svm" > function. > > But I did not see how it can be used to optimize the "cost" parameter. > > By the way, what does a 0 training error and a high testing > error mean? > Varying "cross=5", or "cross=10", etc. does not change the > training error > and testing error at all. How to improve?Overfitting, which varying different validation method will not solve. Andy> Thanks a lot! > > M. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Liaw, Andy
2006-Feb-28 21:47 UTC
[R] does svm have a CV to obtain the best "cost" parameter?
You might find http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf <http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf> helpful. Parameter tuning is essential for avoiding overfitting. Andy -----Original Message----- From: Michael [mailto:comtech.usa@gmail.com] Sent: Tuesday, February 28, 2006 3:30 PM To: Liaw, Andy Cc: R-help@stat.math.ethz.ch Subject: Re: [R] does svm have a CV to obtain the best "cost" parameter? Hi Andy, Thanks a lot for your answer! So what do I do if the model overfits? Thanks a lot! On 2/28/06, Liaw, Andy < andy_liaw@merck.com <mailto:andy_liaw@merck.com> > wrote: From: Michael> > Hi all, > > I am using the "svm" command in the e1071 package. > > Does it have an automatic way of setting the "cost" parameter?See ?best.svm in that package.> I changed a few values for the "cost" parameter but I hope there is a > systematic way of obtaining the best "cost" value. > > I noticed that there is a "cross" (Cross validation) > parameter in the "svm" > function. > > But I did not see how it can be used to optimize the "cost" parameter. > > By the way, what does a 0 training error and a high testing > error mean? > Varying "cross=5", or "cross=10", etc. does not change the > training error > and testing error at all. How to improve?Overfitting, which varying different validation method will not solve. Andy> Thanks a lot! > > M. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>> >---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments,...{{dropped}}
Liaw, Andy
2006-Mar-01 13:46 UTC
[R] does svm have a CV to obtain the best "cost" parameter?
Do you know that there are (pseudo-)randomness involved in CV? Even if you fix the parameters and run multiple times, you're going to get different answers, let alone changing parameters each time. Hoping to narrow the optimal parameters down to that fine a resolution is generally not realistic. Also, there may well be multiple optima in the CV error `surface'. Take your pick. Andy -----Original Message----- From: Michael [mailto:comtech.usa@gmail.com] Sent: Wednesday, March 01, 2006 3:59 AM To: Liaw, Andy Cc: R-help@stat.math.ethz.ch Subject: Re: [R] does svm have a CV to obtain the best "cost" parameter? Thanks a lot Andy. I read that paper and followed the instructions, but met with a lot peculiarities: 1. In using "tune" function for "svm", the best "cost" value turns out to be multi-peaks, and not with a single global peak. So I don't know which peak to follow in order to refine my search grid and do more detailed search in a smaller/focused range. Please see below. - Detailed performance results: cost error 1 0.0004882813 0.05065909 2 0.0005608879 0.05122727 3 0.0006442910 0.04895130 4 0.0007400960 0.04725000 5 0.0008501470 0.04497078 6 0.0009765625 0.04497078 7 0.0011217757 0.04497078 8 0.0012885819 0.04440260 9 0.0014801920 0.04155844 10 0.0017002941 0.03985065 11 0.0019531250 0.04099675 12 0.0022435515 0.04327273 13 0.0025771639 0.04099675 14 0.0029603839 0.03929221 15 0.0034005881 0.03986039 16 0.0039062500 0.04157143 17 0.0044871029 0.04099675 18 0.0051543278 0.04042857 19 0.0059207678 0.03871753 20 0.0068011763 0.03871429 21 0.0078125000 0.03985065 22 0.0089742059 0.04042532 23 0.0103086556 0.04042532 24 0.0118415357 0.04099675 25 0.0136023526 0.04042532 26 0.0156250000 0.04440260 2. I first tried 2^(-15:15), and found the best "cost" to be around 2^(-8), then I reduce the range, run "tune" on cost values 2^(-11:-6), and it returned a best "cost" value to be 2^(-9), which is different from 2^(-8), then I run it on seq(-11, -6, by = 0.2), the best "cost" value was found to be 2^(-7.2), and with the above multi-peaks... each time the best "cost" is at a different value. And with the above multi-peaks, a lot of local optima, I don't know what range should I focus on for the next step... The code I've used is as below: obj <- tune(svm, x, y, ranges = list(cost = 2^seq(-11, -6, by=0.2)), tunecontrol = tune.control(sampling = "cross") , kernel='linear' ) ------------------------ What can I do now? Thanks a lot! On 2/28/06, Liaw, Andy < andy_liaw@merck.com <mailto:andy_liaw@merck.com> > wrote: You might find http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf <http://www.csie.ntu.edu.tw/%7Ecjlin/papers/guide/guide.pdf> helpful. Parameter tuning is essential for avoiding overfitting. Andy -----Original Message----- From: Michael [mailto:comtech.usa@gmail.com <mailto:comtech.usa@gmail.com> ] Sent: Tuesday, February 28, 2006 3:30 PM To: Liaw, Andy Cc: R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> Subject: Re: [R] does svm have a CV to obtain the best "cost" parameter? Hi Andy, Thanks a lot for your answer! So what do I do if the model overfits? Thanks a lot! On 2/28/06, Liaw, Andy < <mailto:andy_liaw@merck.com> andy_liaw@merck.com> wrote: From: Michael> > Hi all, > > I am using the "svm" command in the e1071 package. > > Does it have an automatic way of setting the "cost" parameter?See ?best.svm in that package.> I changed a few values for the "cost" parameter but I hope there is a > systematic way of obtaining the best "cost" value. > > I noticed that there is a "cross" (Cross validation) > parameter in the "svm" > function. > > But I did not see how it can be used to optimize the "cost" parameter. > > By the way, what does a 0 training error and a high testing > error mean? > Varying "cross=5", or "cross=10", etc. does not change the > training error > and testing error at all. How to improve?Overfitting, which varying different validation method will not solve. Andy> Thanks a lot! > > M. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>> >---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ---------------------------------------------------------------------------- -- ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as Banyu) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system. ---------------------------------------------------------------------------- -- ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ [[alternative HTML version deleted]]