thr3ads.net - R help - [R] SVM accuracy question [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Riccardo G-Mail

2011-Sep-26 22:32 UTC

[R] SVM accuracy question

Hi, I'm working with support vector machine for the classification 
purpose, and I have a problem about the accuracy of prediction.

I divided my data set in train (1/3 of enteire data set) and test (2/3 
of data set) using the "sample" function. Each time I perform the svm 
model I obtain different result, according with the result of the 
"sample" function. I would like to "stabilize" the
performance of my
analysis. To do this I used the "set.seed" function. Is there a better
way to do this? Should I perform a bootstrap on my work-flow (sample and 
svm)?

Here is an example of my workflow:
### not to run
index <- 1:nrow(myData)
set.seed(23)
testindex <- sample(index, trunc(length(index)/3))
testset <- myData[testindex, ]
trainset <- myData[-testindex, ]

tune.svm()
svm.model <- svm(Factor ~ ., data = myData, cost = from tune.svm,
                  gamma = from tune.svm, cross= 10, subset= testset)
summary(svm.model)
predict(svm.model, testset)

Best
Riccardo

R. Michael Weylandt

2011-Sep-26 23:58 UTC

head link

[R] SVM accuracy question

Why exactly do you want to "stabilize" your results?

If it's in preparation for publication/classroom demo/etc., certainly
resetting the seed before each run (and hence getting the same sample()
output) will make your results exactly reproducible. However, if you are
looking for a clearer picture of the true efficacy of your svm and there's
no real underlying order to the data set (i.e., not a time series), then a
straight sample() seems better to me.

I'm not particularly well read on the svm literature, but it sounds like you
are worried by widely varying performance of the svm itself. If that's the
case, it seems (to me at least) that there are certain data points that are
strongly informative and it might be a more interesting question to look
into which ones those are.

I guess my answer, as a total non-savant in the field, is that it depends on
your goal: repeated runs with sample will give you more information about
the strength of the svm while setting the seed will give you
reproducibility. Importance sampling might be of interest, particularly if
it could be tied to the information content of each data point, and a quick
skim of the MC variance reduction literature might just provide some fun
insights.

I'm not entirely sure how you mean to bootstrap the act of setting the seed
(a randomly set seed seems to be the same as not setting a seed at all) but
that might give you a nice middle ground.

Sorry this can't be of more help,

Michael

On Mon, Sep 26, 2011 at 6:32 PM, Riccardo G-Mail
<ric.romoli@gmail.com>wrote:
> Hi, I'm working with support vector machine for the classification
purpose,
> and I have a problem about the accuracy of prediction.
>
> I divided my data set in train (1/3 of enteire data set) and test (2/3 of
> data set) using the "sample" function. Each time I perform the
svm model I
> obtain different result, according with the result of the
"sample" function.
> I would like to "stabilize" the performance of my analysis. To do
this I
> used the "set.seed" function. Is there a better way to do this?
Should I
> perform a bootstrap on my work-flow (sample and svm)?
>
> Here is an example of my workflow:
> ### not to run
> index <- 1:nrow(myData)
> set.seed(23)
> testindex <- sample(index, trunc(length(index)/3))
> testset <- myData[testindex, ]
> trainset <- myData[-testindex, ]
>
> tune.svm()
> svm.model <- svm(Factor ~ ., data = myData, cost = from tune.svm,
>                 gamma = from tune.svm, cross= 10, subset= testset)
> summary(svm.model)
> predict(svm.model, testset)
>
> Best
> Riccardo
>
> ______________________________**________________
> R-help@r-project.org mailing list
>
https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.R-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Sep 2011 - SVM accuracy question

[R] SVM accuracy question

[R] SVM accuracy question

Possibly Parallel Threads