Hi all, I am using the "svm" command in the e1071 package. Does it have an automatic way of setting the "cost" parameter? I changed a few values for the "cost" parameter but I hope there is a systematic way of obtaining the best "cost" value. I noticed that there is a "cross" (Cross validation) parameter in the "svm" function. But I did not see how it can be used to optimize the "cost" parameter. By the way, what does a 0 training error and a high testing error mean? Varying "cross=5", or "cross=10", etc. does not change the training error and testing error at all. How to improve? Thanks a lot! M. [[alternative HTML version deleted]]
Liaw, Andy
2006-Feb-28  12:14 UTC
[R] does svm have a CV to obtain the best "cost" parameter?
From: Michael> > Hi all, > > I am using the "svm" command in the e1071 package. > > Does it have an automatic way of setting the "cost" parameter?See ?best.svm in that package.> I changed a few values for the "cost" parameter but I hope there is a > systematic way of obtaining the best "cost" value. > > I noticed that there is a "cross" (Cross validation) > parameter in the "svm" > function. > > But I did not see how it can be used to optimize the "cost" parameter. > > By the way, what does a 0 training error and a high testing > error mean? > Varying "cross=5", or "cross=10", etc. does not change the > training error > and testing error at all. How to improve?Overfitting, which varying different validation method will not solve. Andy> Thanks a lot! > > M. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
Liaw, Andy
2006-Feb-28  21:47 UTC
[R] does svm have a CV to obtain the best "cost" parameter?
You might find http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf <http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf> helpful. Parameter tuning is essential for avoiding overfitting. Andy -----Original Message----- From: Michael [mailto:comtech.usa@gmail.com] Sent: Tuesday, February 28, 2006 3:30 PM To: Liaw, Andy Cc: R-help@stat.math.ethz.ch Subject: Re: [R] does svm have a CV to obtain the best "cost" parameter? Hi Andy, Thanks a lot for your answer! So what do I do if the model overfits? Thanks a lot! On 2/28/06, Liaw, Andy < andy_liaw@merck.com <mailto:andy_liaw@merck.com> > wrote: From: Michael> > Hi all, > > I am using the "svm" command in the e1071 package. > > Does it have an automatic way of setting the "cost" parameter?See ?best.svm in that package.> I changed a few values for the "cost" parameter but I hope there is a > systematic way of obtaining the best "cost" value. > > I noticed that there is a "cross" (Cross validation) > parameter in the "svm" > function. > > But I did not see how it can be used to optimize the "cost" parameter. > > By the way, what does a 0 training error and a high testing > error mean? > Varying "cross=5", or "cross=10", etc. does not change the > training error > and testing error at all. How to improve?Overfitting, which varying different validation method will not solve. Andy> Thanks a lot! > > M. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>> PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html<http://www.R-project.org/posting-guide.html>> >---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments,...{{dropped}}
Liaw, Andy
2006-Mar-01  13:46 UTC
[R] does svm have a CV to obtain the best "cost" parameter?
Do you know that there are (pseudo-)randomness involved in CV?  Even if you
fix the parameters and run multiple times, you're going to get different
answers, let alone changing parameters each time.  Hoping to narrow the
optimal parameters down to that fine a resolution is generally not
realistic.  Also, there may well be multiple optima in the CV error
`surface'.  Take your pick.
 
Andy
-----Original Message-----
From: Michael [mailto:comtech.usa@gmail.com] 
Sent: Wednesday, March 01, 2006 3:59 AM
To: Liaw, Andy
Cc: R-help@stat.math.ethz.ch
Subject: Re: [R] does svm have a CV to obtain the best "cost"
parameter?
Thanks a lot Andy. 
I read that paper and followed the instructions, but met with a lot
peculiarities:
1. In using "tune" function for "svm", the best
"cost" value turns out to be
multi-peaks, and not  with a single global peak. So I don't know which peak
to follow in order to refine my search grid and do more detailed search in a
smaller/focused range. Please see below. 
- Detailed performance results:
           cost      error
1  0.0004882813 0.05065909
2  0.0005608879 0.05122727
3  0.0006442910 0.04895130
4  0.0007400960 0.04725000
5  0.0008501470 0.04497078
6  0.0009765625 0.04497078
7  0.0011217757 0.04497078
8  0.0012885819 0.04440260
9  0.0014801920 0.04155844
10 0.0017002941 0.03985065
11 0.0019531250 0.04099675
12 0.0022435515 0.04327273
13 0.0025771639 0.04099675
14 0.0029603839 0.03929221
15 0.0034005881 0.03986039
16 0.0039062500 0.04157143
17 0.0044871029 0.04099675
18 0.0051543278 0.04042857
19 0.0059207678 0.03871753
20 0.0068011763 0.03871429 
21 0.0078125000 0.03985065
22 0.0089742059 0.04042532
23 0.0103086556 0.04042532
24 0.0118415357 0.04099675
25 0.0136023526 0.04042532
26 0.0156250000 0.04440260
2. I first tried 2^(-15:15), and found the best "cost" to be around
2^(-8),
then I reduce the range, run "tune" on cost values 2^(-11:-6), and it
returned a best "cost" value to be 2^(-9), which is different from
2^(-8),
then I run it on seq(-11, -6, by = 0.2), the best "cost" value was
found to
be 2^(-7.2), and with the above multi-peaks... each time the best
"cost" is
at a different value. And with the above multi-peaks, a lot of local optima,
I don't know what range should I focus on for the next step... 
The code I've used is as below:
obj <- tune(svm, x, y,  
           ranges = list(cost = 2^seq(-11, -6, by=0.2)),
           tunecontrol = tune.control(sampling = "cross") , 
           kernel='linear' 
          )
------------------------
What can I do now?
Thanks a lot!
On 2/28/06, Liaw, Andy < andy_liaw@merck.com
<mailto:andy_liaw@merck.com> >
wrote: 
You might find http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
<http://www.csie.ntu.edu.tw/%7Ecjlin/papers/guide/guide.pdf>  helpful.  
 
Parameter tuning is essential for avoiding overfitting.
 
Andy
-----Original Message-----
From: Michael [mailto:comtech.usa@gmail.com <mailto:comtech.usa@gmail.com>
]
Sent: Tuesday, February 28, 2006 3:30 PM
To: Liaw, Andy
Cc: R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> 
Subject: Re: [R] does svm have a CV to obtain the best "cost"
parameter?
Hi Andy, 
Thanks a lot for your answer! So what do I do if the model overfits?
Thanks a lot!
On 2/28/06, Liaw, Andy <  <mailto:andy_liaw@merck.com>
andy_liaw@merck.com>
wrote: 
From: Michael>
> Hi all,
>
> I am using the "svm" command in the e1071 package. 
>
> Does it have an automatic way of setting the "cost" parameter?
See ?best.svm in that package.
> I changed a few values for the "cost" parameter but I hope there
is a
> systematic way of obtaining the best "cost" value.
>
> I noticed that there is a "cross" (Cross validation)
> parameter in the "svm"
> function.
>
> But I did not see how it can be used to optimize the "cost"
parameter.
>
> By the way, what does a 0 training error and a high testing
> error mean?
> Varying "cross=5", or "cross=10", etc. does not change
the
> training error
> and testing error at all. How to improve? 
Overfitting, which varying different validation method will not solve.
Andy
> Thanks a lot!
>
> M.
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________ 
> R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch>  mailing
list
> https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html> >
>
----------------------------------------------------------------------------
--
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message.  If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system. 
----------------------------------------------------------------------------
--
----------------------------------------------------------------------------
--
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message. If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system. 
----------------------------------------------------------------------------
--
------------------------------------------------------------------------------
------------------------------------------------------------------------------
	[[alternative HTML version deleted]]