Matthew Wood
2014-Feb-21 19:50 UTC
[R] [e1071] Features that are factors when exporting a model with write.svm
I have a trained SVM that I want to export with write.svm and eventually use in libSVM. Some of my features are factors. Standard libSVM only works with features that are doubles, so I need to figure out how my features should be represented and used. How does e1071 treat factors in an SVM? For feature "foo" with values "a" and "b" I'm assuming it's something like foo_a (0 or 1) and foo_b (0 or 1). Is that right? Do factors get treated differently in an SVM? If I convert the factors to intergers for libSVM, I'll lose the information that a feature doesn't take on a range of values. Is that going to cause problems? I don't know if the model takes that into account. When using write.svm a scale file is also output. My scale file is missing the same number of rows as I have features that are factors. That's another indication to me that the factors are causing issues. Thanks.
Matthew Wood
2014-Feb-21 20:33 UTC
[R] [e1071] Features that are factors when exporting a model with write.svm
I may have been able to answer my own questions by reading the e1071 source. It looks like the features are just converted to doubles with as.double(x). And, I haven't found where in the code yet, but it looks like it's not scaling the factors which explains why I'm missing rows in the scale file. On Fri, Feb 21, 2014 at 1:50 PM, Matthew Wood <doowttam at gmail.com> wrote:> I have a trained SVM that I want to export with write.svm and > eventually use in libSVM. Some of my features are factors. Standard > libSVM only works with features that are doubles, so I need to figure > out how my features should be represented and used. > > How does e1071 treat factors in an SVM? For feature "foo" with values > "a" and "b" I'm assuming it's something like foo_a (0 or 1) and foo_b > (0 or 1). Is that right? > > Do factors get treated differently in an SVM? If I convert the factors > to intergers for libSVM, I'll lose the information that a feature > doesn't take on a range of values. Is that going to cause problems? I > don't know if the model takes that into account. > > When using write.svm a scale file is also output. My scale file is > missing the same number of rows as I have features that are factors. > That's another indication to me that the factors are causing issues. > > Thanks.