thr3ads.net - R help - [R] Levels in new data fed to SVM [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Claus O'Rourke

2013-Jan-08 20:14 UTC

[R] Levels in new data fed to SVM

Hi all,
I've encountered an issue using svm (e1071) in the specific case of
supplying new data which may not have the full range of levels that
were present in the training data.

I've constructed this really primitive example to illustrate the point:
> library(e1071)
> training.data <- data.frame(x =
c("yellow","red","yellow","red"), a =
c("alpha","alpha","beta","beta"), b =
c("a", "b", "a", "c"))
> my.model <- svm(x ~ .,data=training.data)
> test.data <- data.frame(x = c("yellow","red"), a =
c("alpha","beta"), b = c("a", "b"))
> predict(my.model,test.data)Error in predict.svm(my.model, test.data) :
  test data does not match model !>
> levels(test.data$b) <- levels(training.data$b)
> predict(my.model,test.data)     1      2
yellow    red
Levels: red yellow

In the first case test.data$b does not have the level "c" and this
results in the input data being rejected. I've debugged this down to
the point of model matrix creation in the SVM R code. Once I fill up
the levels in the test data with the levels from the original data,
then there is no problem at all.

Assuming my test data has to come from another source where the number
of category levels seen might not always be as large as those for the
original training data, is there a better way I should be handling
this?

Thanks

Uwe Ligges

2013-Jan-10 12:47 UTC

head link

[R] Levels in new data fed to SVM

On 08.01.2013 21:14, Claus O'Rourke wrote:> Hi all,
> I've encountered an issue using svm (e1071) in the specific case of
> supplying new data which may not have the full range of levels that
> were present in the training data.
>
> I've constructed this really primitive example to illustrate the point:
>
>> library(e1071)
>> training.data <- data.frame(x =
c("yellow","red","yellow","red"), a =
c("alpha","alpha","beta","beta"), b =
c("a", "b", "a", "c"))
>> my.model <- svm(x ~ .,data=training.data)
>> test.data <- data.frame(x = c("yellow","red"), a
= c("alpha","beta"), b = c("a", "b"))
>> predict(my.model,test.data)
> Error in predict.svm(my.model, test.data) :
>    test data does not match model !
>>
>> levels(test.data$b) <- levels(training.data$b)
>> predict(my.model,test.data)
>       1      2
> yellow    red
> Levels: red yellow
>
> In the first case test.data$b does not have the level "c" and
this
> results in the input data being rejected. I've debugged this down to
> the point of model matrix creation in the SVM R code. Once I fill up
> the levels in the test data with the levels from the original data,
> then there is no problem at all.
>
> Assuming my test data has to come from another source where the number
> of category levels seen might not always be as large as those for the
> original training data, is there a better way I should be handling
> this?

You have to tell the factor about the possible levels, it does not 
necessarily contain examples.
That means:

levels(test.data$b) <- C("a", "b", "c")
predict(my.model,test.data)

will help.

Best,
Uwe Ligges


> Thanks
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Reasonably Related Threads

Search for more seemingly similar threads

R help - Jan 2013 - Levels in new data fed to SVM

[R] Levels in new data fed to SVM

[R] Levels in new data fed to SVM

Reasonably Related Threads