Jokel Meyer
2011-Sep-27 09:26 UTC
[R] Workflow for binary classification problem using svm via e1071 package
Dear R-list! I am using the e1071 package in R to solve a binary classification problem in a dataset of round 180 predictor variables (blood metabolites) of two groups of subjects (patients and healthy controls). I am confused regarding the correct way to assess the classification accuracy of the trained svm. (A) The svm command allows to specificy via the 'cross=k' parameter to specify a k-fold crossvalidation. This results in k values for classification accuracy and their corresponding mean. (B) On the other hand most textbooks and tutorials I was browsing, recommend separating the data into a training and a test-set and then to assess prediction accuarcy by checking the accuracy of the trained svm when applied to the test-set. I am not sure whether (A) and (B) would be alternative ways to assess prediction accuracy? Or is option (A) only the accuracy of the svm when applied to the test set and therefore I should implement option (B) after I used option (A)? So would it be the correct way to use first (A) then do grid-search (via the tune command) to find the best hyperparameters and then test the trained svm by applying it to the test set? And in case I use a linear kernel instead of RBF, I guess I do not need to run grid-search as there are no hyperparameters to estimate? BEst, Jokel [[alternative HTML version deleted]]