Dear R:ers, I'm using the svm from the e1071 package to train a model with the option "probabilities = TRUE". I then use "predict" with "probabilities = TRUE" and get the probabilities for the data point belonging to either class. So far all is well. My question is why I get different results each time I train the model, although I use exactly the same data. The prediction seems to be reproducible, but if I re-train the model, the probabilities vary some what. Here, I have trained a model on exactly the same data five times. When predicting using the different models, this is how the probabilities vary: probabilities Grp.0 Grp.1 0.7077155 0.2922845 0.7938782 0.2061218 0.8178833 0.1821167 0.7122203 0.2877797 How can the predictions using the same training and test data vary so much? Thanks, Anders
Hi Anders, On Oct 21, 2009, at 8:49 AM, Anders Carlsson wrote:> Dear R:ers, > > I'm using the svm from the e1071 package to train a model with the > option "probabilities = TRUE". I then use "predict" with > "probabilities = TRUE" and get the probabilities for the data point > belonging to either class. So far all is well. > > My question is why I get different results each time I train the > model, although I use exactly the same data. The prediction seems to > be reproducible, but if I re-train the model, the probabilities vary > some what. > > Here, I have trained a model on exactly the same data five times. > When predicting using the different models, this is how the > probabilities vary:I'm not sure I'm following the example your giving and the scenario you are describing.> probabilities > Grp.0 Grp.1 > 0.7077155 0.2922845 > 0.7938782 0.2061218 > 0.8178833 0.1821167 > 0.7122203 0.2877797This seems fine to me: it looks like the probabilities of class membership for 4 examples (Note that Grp.0 + Grp.1 = 1).> How can the predictions using the same training and test data vary > so much?I'm trying the code below several times (taken from the example), and the probabilities calculated from the call to prediction don't change much at all: R> data(iris) R> attach(iris) R> model <- svm(x, y, probability=TRUE) R> predict(model, x, probability=TRUE) To be fair, the probabilities aren't exactly the same, but the difference between two runs is really small: R> model <- svm(x, y, probability=TRUE) R> a <- predict(model, x, probability=TRUE) R> model <- svm(x, y, probability=TRUE) R> b <- predict(model, x, probability=TRUE) R> mean(abs(attr(a, 'probabilities') - attr(b, 'probabilities'))) [1] 0.003215959 Is this what you were talking about, or ... ? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Hi,> <snip> > > If I instead output the decision values, the whole procedure is > > fully reproducible, i.e. the exact same values are returned when I > > retrain the model. > > By the decision values, you mean the predict labels, right?The output of decision values can be turned on in the predict.svm, and is, as I have understood, the distance from the data point to the hyperplane. (I should say that my knowledge here is limited to concepts, I know nothing about the details in which this works...). I use these to create ROC curves etc.> > > I have no idea how the probabilities are calculated, but it seems to > > be in this step that the differences arise. In my case, I feel a bit > > hesitant to use them when they differ that much between runs (15% or > > so)... > > I'd find that a bit disconcerting, too. Can you give a sample of your > data + code your using that can reproduce this example? >I have the data at the office, so I can't do that now (at home).> Warning: Brainstorming Below > > If I were to calculate probabilities for my class labels, I'd make the > probability some function of the example's distance from the decision > boundary. > > Now, if your decision boundary isn't changing from run to run (and I > guess it really shouldn't be, since the SVM returns the maximum margin > classifier (which is, by definition, unique, right?)), it's hard to > imagine why these probabilities would change, either ... > > ... unless you're holding out different subsets of your data during > training, or perhaps have a different value for your penalty (cost) > parameter when building the model. I believe you said that you're > actually training the same exact model each time, though, right?Yes, I'm using the exact same data to train each time. I thought this would generate identical models, but that doesn't appear to be the case.> > Anyway, I see the help page for ?svm says this, if it helps: > > "The probability model for classification fits a logistic distribution > using maximum likelihood to the decision values of all binary > classifiers, and computes the a-posteriori class probabilities for the > multi-class problem using quadratic optimization"This is where I realise I'm in a bit over my head on the theroy side - this means nothing to me...> > -steveThanks again, Anders
Possibly Parallel Threads
- decision values and probability in SVM
- repeated execution of svm(e1071) gives different results, if probability = TRUE is set
- probabilities in svm output in e1071 package
- getting probabilities from SVM
- Probabilities outside [0, 1] using Support Vector Machines (SVM) in e1071