stefania.pecore at univ-ubs.fr
2017-Sep-02 15:26 UTC
[R] problem in testing data with e1071 package (SVM Multiclass)
Hello all, this is the first time I'm using R and e1071 package and SVM multiclass (and I'm not a statistician)! I'm very confused, then. The goal is: I have a sentence with sunny; it will be classified as "yes" sentence; I have a sentence with cloud, it will be classified as "maybe"; I have a sentence with rainy il will be classified as "no". The true goal is to do some text classification to apply then for my research. I have two files: * train.csv: a file where there are two columns/Variables one is the data, the other is the label Example: |V1 V2 1sunny yes 2sunny sunny yes 3sunny rainy sunny yes 4sunny cloud sunny yes 5rainy no6rainy rainy no7rainy sunny rainy no8rainy cloud rainy no9cloud maybe 10cloud cloud maybe 11cloud rainy cloud maybe 12cloud sunny cloud maybe| * test.csv: in this file there are the new data to be classified and it is in one column/variable. Example: |V1 1sunny 2rainy 3hello 4cloud 5a 6b 7cloud 8d 9e 10f 11g 12hello| Following the examples from the iris dataset (https://cran.r-project.org/web/packages/e1071/e1071.pdfandhttp://rischanlab.github.io/SVM.html) I created my model and then test the training data in this way: |>library(e1071)>train <-read.csv(file="./train.csv",sep =";",header =FALSE)>test <-read.csv(file="./test.csv",sep =";",header =FALSE)>attach(train)>x <-subset(train,select=-V2) >y <-V2 >model <-svm(V2 ~.,data =train,probability=TRUE) >summary(model)Call:svm(formula =V2 ~.,data =train,probability =TRUE)Parameters:SVM-Type:C-classification SVM-Kernel:radial cost:1gamma:0.08333333Numberof SupportVectors:12(444)Numberof Classes:3Levels:maybe noyes>pred <-predict(model,x)>system.time(pred <-predict(model,x)) user system elapsed 000 >table(pred,y)y | |pred maybe noyes maybe 400no040yes 004>pred 123456789101112yes yes yes yes nonononomaybe maybe maybe maybe Levels:maybe noyes| || I think it's ok until now. Now the question is: what about the test data? I didn't find anything for the test data. Then, I thought that maybe I should test the model with the test data. And I did this: | >test V1 1sunny 2rainy 3hello 4cloud 5a 6b 7cloud 8d 9e 10f 11g 12hello >z <-subset(test,select=V1)>pred <-predict(model,z)Errorinpredict.svm(model,z):test data does notmatch model !| What is wrong here? Can you please explain me how can I test new data using the old train model? For two days I asked everywhere and saw many websites but didn't find a solution and it's very complicated because I think that the logic behind the code is ok, but something is missin in my way to express it using R. Thank you for your help || [[alternative HTML version deleted]]