I have a data set with about 30,000 training cases and 103 variable. I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. The accuracy isn't great. I used a grid search over the C and G parameters with an RBF kernel to find the best settings. I remember that for least squares, R has a nice stepwise function that will try combining subsets of variables to find the optimal result. Clearly, this doesn't exist for SVMs as a built in function. As an experiment, I simply grabbed the first 50 variables and repeated the training/grid search procedure. The results were significantly better. Since the date is VERY noisy, my guess is that eliminating some of the variables eliminated some noise that resulted in better results. With a grid of 100 parameter settings (10 for C, 10 for G) and 106 variables, trying every combination would be prohibitively time consuming. Can anyone suggest an approach to seek the ideal subset of variables for my SVM classifier? Thanks!
Hi, On Fri, Jan 7, 2011 at 2:10 AM, Noah Silverman <noah at smartmediacorp.com> wrote:> I have a data set with about 30,000 training cases and 103 variable. > > I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. > ?The accuracy isn't great. > > I used a grid search over the C and G parameters with an RBF kernel to find > the best settings. > > I remember that for least squares, R has a nice stepwise function that will > try combining subsets of variables to find the optimal result. ?Clearly, > this doesn't exist for SVMs as a built in function. > > As an experiment, I simply grabbed the first 50 variables and repeated the > training/grid search procedure. ?The results were significantly better. > ?Since the date is VERY noisy, my guess is that eliminating some of the > variables eliminated some noise that resulted in better results. > > With a grid of 100 parameter settings (10 for C, 10 for G) and 106 > variables, trying every combination would be prohibitively time consuming. > > Can anyone suggest an approach to seek the ideal subset of variables for my > SVM classifier?Sounds like a job for the types of approaches found in the penalizedSVM package: http://cran.r-project.org/web/packages/penalizedSVM/index.html -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
On 06/01/11 23:10:59, Noah Silverman wrote:> I have a data set with about 30,000 training cases and 103 variable. > I've trained an SVM (using the e1071 package) for a binary classifier > {0,1}. The accuracy isn't great. I used a grid search over the C and G > parameters with an RBF kernel to find the best settings. [...] > > Can anyone suggest an approach to seek the ideal subset of variables for > my SVM classifier?The standard feature selection stuff (backward/forward etc.) is probably ruled out by the time it takes to compute all the sets and subsets. What you could try is the following: First, do a cross-validation setup: split up your data set into a training and testing set (ratio 0.9 / 0.1 or so). Second, train your SVM on the training set (try conservative parameters first). Third, have your trained SVM classify the test set and compute the classification error. Fourth, iterate over all variables and do the following: a) choose one variable and permute its values (only) in the test set b) have your trained SVM (from step 2) classify this test set and measure the classification error c) repeat a) and b) a (high) number of times to be significant d) go to next variable Fifth, you can get an impression of the importance that one variable has by comparing the errors generated on the permuted test set for each variable with the non-permuted test set classification error. If the permutation of one variable drastically increases the classification error, the variable is probably important. Sixth: repeat the cross-validation / random sampling a number of times to be significant. This is more like an ad-hoc approach and there are some pitfalls, but the idea is easily explained and can also be carried over to any other regression model with cross-validation. The computational burden in SVM is assumed to be the training and not the prediction step and you only need a relatively low number of training runs (sixth step) here. Regards, Georg. -- Research Assistant Otto-von-Guericke-Universit?t Magdeburg research at georgruss.de http://research.georgruss.de