Hi: I don't know anything about gentoypes but it sounds like you overfitted
the training set so you should try using regularization. In standard
svm-classification algorithms, that can be done by decreasing the parameter
C which decreases the objective functional penalty for mis-classifying. (
allows the margin to increase by allowing the
algorithm to mis-classify more often ) But you're using caret rather than
one of the svm packages directly so the parameter might be called something
else rather than C.
There are so many books on support vector machines but a nice intro from an
R perspective is "Support Vector Machines in R" in the Journal of
Statistical Software. ( it's free at www.jstatsoft.com )
On Fri, Jun 15, 2012 at 8:19 AM, Guido Leoni <guido.leoni@gmail.com>
wrote:
> Dear list
> I've a generic question about how to tune an SVM
> I'm trying to classify with caret package some population data from a
> case-control study . In each column of my matrix there are the SNP
> genotypes , in each row there are the individuals.
> I correctly splitted my total dataset in training(132 individuals) and test
> (50 individuals) (respecting the total observed genotypic frequencies and
> the % of cases and controls)
> After training (with radial RBF function) I have an accuracy of the best
> model of 76% but applying the model to my test dataset the accuracy
> decreases to 52%.
> Obviously i expected the decrease but this appear to be quite big in my
> opinion.
> I manually checked the predictions for my test dataset and some cases that
> have no risk allele are not well classified. Similar cases in my training
> dataset are well recognized.
> Please could you suggest to me which parameters modify in order to improve
> the classification for the test dataset? or better which could be the
> causes that could originate this big discrepancy?
> I know that my question is very generic but i'm very newbie to this
kind of
> analysis so please any suggestion is the welcome
> thank you very much
> Guido
>
> --
> Guido Leoni
> National Research Institute on Food and Nutrition
> (I.N.R.A.N.)
> via Ardeatina 546
> 00178 Rome
> Italy
>
> tel + 39 06 51 49 41 (operator)
> + 39 06 51 49 4498 (direct)
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]